Druidic Imply launches Shapeshift project for modern analytics

Druidic Wickerman
Wikipedia public domain image: https://en.wikipedia.org/wiki/Druid#/media/File:The_Wicker_Man_of_the_Druids_crop.jpg

Real-time analytics database startup Imply has unveiled Project Shapeshift — designed to develop a hardware-abstracting, auto-scaling control plane and SaaS service for the open source Apache Druid software on which it is based.

It will extend its SQL API from querying to ingestion, processing and transformation and so simplify development cycles. Imply will also build a serverless and elastic consumption experience for Apache Druid. There will be product updates for both Druid and Imply over the next year as part of this Project Shapeshift initiative.

Imply CTO Gian Merlino said in a statement: “Establishing a path forward to massive adoption of a new data infrastructure lies in a strong commitment to advancing the underlying open source technology combined with a dedication to re-engineer the very foundation of that technology to be truly cloud-native.”

Apache Druid is an open-source, real-time, analytics database which supports data streamed from the Kafka and Amazon Kinesis message busses, batch loading from HDFS and S3 and many popular file formats. Imply was started to create a software business based on adding functionality and services to Druid to create a kind of real-time, data warehouse plus search combination.

Imply introduction video.

It was founded in 2015 by CEO Fangjin Yang, chief experience officer Vadim Ogievetsky, and Merlino, who were creators of Apache Druid, and has taken in a substantial $115.3 million in funding with seed ($2 million), A ($13.3 million), B ($30million) and C-rounds ($70 million this year).This is yet more evidence, after the Snowflake funding and IPO saga, of the awesomely strong attractive power that analytics software holds for venture capitalists.

Apache Druid uses inverted indexes (in particular, compressed bitmaps) for fast searching and filtering and can support numerical aggregations, groupBys (including multi-dimensional groupBys), and other analytic workloads faster and more efficiently than search systems. A Druid FAQ answers basic questions about it and a Wikipedia entry answers even more basic questions.

Imply will develop a substantial architectural expansion on top of Apache Druid, to provide more flexibility and analytics capabilities for applications. There will be a multi-stage, decoupled query layer integrated with the core Druid database engine. This will help developers support all of their analytics requirements for their applications with one platform.

The company will also provide an overall improvement to the ease of use across data ingestion, queries and cluster operations, to deliver what it claims will be the most developer-friendly database for analytics applications. It will improve reporting for large result sets and long-running queries and complex conditional alerting across millions of objects.

A video provides more background about Imply and its Druid work, as does a blog by Fangjin Yang.