Imply accelerates massive real-time analytics

Head torch MIlky Way at night searcher

Startup Imply has made its parallel query engine generally available to speed reporting from massive real-time Apache Druid databases, added SQL ingestion, a total cost of ownership pledge and announced 250 customers for its Polaris Druid-based cloud database service.

The open source Apache Druid database stores huge sets of streaming and historical data, delivering real-time answers from analytics queries. Druid’s code originators founded Imply and put a multi-stage query engine into a private preview in March. Queries are split into stages and run across distributed servers in parallel with a so-called shuffle mesh framework. This software engine is now generally available.

Gian Merlino, Imply CTO and co-founder and PMC chair for Apache Druid, said: “We always thought of Druid as a shapeshifter when we originally built it to support analytics apps of any scale. Now we’re excited to show the world just how nimble it can be with the addition of multi-stage queries and SQL-based ingestion.”

Imply graphic
Imply graphic

Apache Druid v24.0, with its multi-stage query engine, also enables and features:

  • Easier and up to 65 percent faster data ingestion with common SQL queries. Thus counts when Druid databases can ingest hundreds of terabytes a day
  • Druid supports any in-database transformation without tuning or expertise using SQL, enabling data enhancement, data enrichment, experimentation with aggregates, approximations (including hyperloglogs and theta sketches), and more

Imply says Druid now has a foundation for integration with open source and commercial data tools, covering transformation (dbt), data integration (Informatica, FiveTran, Matillion, Nexla,, data quality (Great Expectations, Monte Carlo, Bigeye), and others.

Total value guarantee

Imply claims that Apache Druid users have a total cost of ownership which includes software, support, and infrastructure. It is introducing a Total Value Guarantee for qualified participants that it says guarantees the total cost of ownership (TCO) to run Druid with Imply will be less than this. What is a qualified participant? A web page should provide that information (it wasn’t live when we checked).

Vadim Ogievetsky, Imply CXO and co-founder, said: “Now with Imply’s Total Value Guarantee, developers can get a partner for Druid that will help them get all the advantages of Imply’s products and services and be there in the middle of the night if needed – with Imply effectively for free.”

Some Imply customers.

Imply said it now has more than 250 customers for its Polaris cloud Druid database service, which was introduced last March. Polaris has been updated with:

  • Support for schemaless ingestion to accommodate nested columns, allowing for arbitrary nesting of typed data like JSON or Avro.
  • DataSketches supported at ingestion for faster sub-second approximate queries
  • Performance monitoring alerts to ensure consistent performance for ultra-low latency queries and greater security with resource-based access control and row-level security
  • Updates to Polaris’ built-in visualization enables faster slicing and dicing
  • New node types to flexibly meet price/performance requirements at any scale
  • Hibernate services for savings
  • Comprehensive consumption and billing metrics for instant usage visibility

Imply said its short-term roadmap includes reports about very large result sets with long-running queries and CPU-heavy but infrequent runs, and an alerting function. This will track large numbers of objects with complex conditions and scale to millions of alerts.