Faster SQL engine and new metastore for Dremio’s ‘data lakehouse’

Data analyser startup Dremio, which wants to kill off data warehouses, has launched a data lakehouse cloud, a Sonar SQL engine, and an Arctic lakehouse metadata store using Apache open source projects.

A data lakehouse is a portmanteau term combining “data lake” and “data warehouse” and meaning, in Dremio’s sense, running data warehouse SQL functions on a raw data lake. There is no need for extract, transform and load (ETL) processes to pre-process the data and load it into a data warehouse such as ones from Teradata and Snowflake.

Dremio CEO Billy Bosworth issued a statement: “In the past, companies had to weigh the pros and cons of data lakes vs data warehouses. We’ve eliminated this tradeoff by providing a frictionless and infinitely scalable service that delivers the combined benefits of both.

“Dremio Cloud is the world’s first lakehouse platform that was built from the ground up for SQL workloads, including mission-critical BI (Business Intelligence).”

Dremio Cloud is a free data lakehouse with data persisted in Apache Iceberg format tables using Apache Parquet’s columnar data format. Iceberg is a high-performance format for petabyte-scale analytic tables and is said to bring SQL table simplicity to big data. Parquet includes compression and encoding, and can handle complex bulk data. 

An enterprise edition of Dremio Cloud offers greater security with custom roles, enterprise identity providers (e.g. Okta), and enterprise support options. 

Sonar and Arctic

Dremio says Sonar, a SQL engine powered by Iceberg, is more than twice as fast as its prior SQL engine. The idea is that customers can save money by using 50 per cent of the cloud infrastructure they would otherwise have needed. Sonar can work on data in S3, in the Dremio Cloud or in other data lakes, and supports SQL Data Manipulation Language (DML) insert, update, and delete operations directly on the lakehouse.

Dremio claims that, as data is automatically optimised in Amazon S3 as it is being changed, customers no longer need to fall back to a data warehouse when data is rapidly mutating.

Tomer Shiran, Dremio founder and chief product officer, said “Organizations can easily analyse data that resides outside the lakehouse (e.g. in relational databases) and perform live and accelerated joins across multiple sources.”

The Sonar software uses metadata and management services such as Glue, Amazon’s serverless data integration function, or Dremio’s Arctic service to connect to and work with source data. Dremio says Sonar obviates the need for ETL processes and data lakes; work directly on data lakes instead with the familiar SQL language.

Arctic is in public preview with support for a variety of lakehouse engines, including Spark, Flink, Presto, Trino, and Dremio Sonar.

Dremio is offering a forever-free edition of Dremio Sonar and Dremio Arctic on Dremio Cloud, supporting unlimited production use and infinite scale, with end-to-end security and SOC 2 Type 2 compliance. 

Find out more about Dremio Cloud here and sign up for it here.