VCs follow the ($28bn) Databricks road, because of the wonderful things it does

Databricks has raised $1bn in a G-round that values the data lake analytics startup at a ginormous $28bn. The long list of investors includes AWS and Salesforce among a clutch of financial institutions.

Databricks enables fast SQL querying and analysis of data lakes without having to first extract, transform and load data into data warehouses. The company claims its “Data Lakehouse” technology delivers 9X better price performance than traditional data warehouses. Databricks supports AWS and Azure clouds and is typically seen as a competitor to Snowflake, which completed a huge IPO in September 2020.

Ali Ghodsi, CEO and co-founder of Databricks, said in the funding announcement: “We’ve worked with thousands of customers to understand where they want to take their data strategy, and the answer is overwhelmingly in favour of data lakes. The fact is that they have massive amounts of data in their data lakes and with SQL Analytics, they now can actually query that data by connecting directly to their BI tools like Tableau.”

The result is, the company says, data warehouse performance with data lake economics. “It is no longer a matter of if organisations will move their data to the cloud, but when, “Ghodsi said. “A Lakehouse architecture built on a data lake is the ideal data architecture for data-driven organisations and this launch gives our customers a far superior option when it comes to their data strategy.”

Prestissimo

Databrick’s open source Delta Lake software is built atop Apache Spark. The company’s SQL Analytics software queries data in Delta Lake, using two techniques to speed operations.

By auto-scaling endpoints for its query cluster nodes, query latency is consistently low under high user load. The software uses an improved query optimizer, and a caching layer that sits between the execution layer and the cloud object storage. It also has a ‘polymorphic vectorised query execution engine’, called Delta Engine, that executes queries quickly on large and small data sets. A Slideshare presentation deck with 88 slides delves deeper into the general technology while a video provides even more information about Delta Engine.