Dremio data lakes can run on VAST Data storage

VAST Data has a partnership with Dremio enabling its lakehouse software to run on-premises, accessing data from VAST’s Universal Storage all-flash file and object array.

Dremio’s software runs in-memory and creates and uses stored source data in a so-called data lake without extract, transform and load (ETL) procedures selecting, reformatting and transferring it to a data warehouse for analysis. It provides the Dremio Cloud, running on AWS or Azure, and supporting SQL workloads. This is a free data lakehouse with data persisted in Apache Iceberg format tables using Apache Parquet’s columnar data. There is an enterprise version as well.

Roger Frey, Dremio
Roger Frey

Roger Frey, Dremio’s VP of Alliance, said: “Partnering with VAST ensures Dremio users are equipped with the lakehouse data capacity and scalable high performance necessary to run their business intelligence workloads and data analytics applications.”

VAST Data CMO and co-founder Jeff Denworth added: “We continue to see high market demand to underpin organizations’ modern data analytics infrastructure with VAST … Partnering with Dremio ensures that our mutual customers have an optimized and simple out-of-the-box experience as they embrace a cloud-native architecture for their rapidly evolving data management needs.”

VAST Data’s Universal Storage is a massively parallel and disaggregated design with a single tier of NVMe QLC (4bits/cell) flash, with storage-class memory used for metadata buffering incoming writes. It has a scale-out set of stateless compute nodes which can access the entire set of stored data. This data is compressed and deduplicated to bring the system’s cost down to disk array levels.

Typically VAST Data sell sits storage to largish enterprises wanting lots of all-flash capacity. Adding Dremio’s software to its portfolio could be appealing to those customers and could drive installed VAST capacity upgrades. 

The two companies say the combination provides “a highly scalable and affordable all-flash, file and object platform that allows you to run petabyte scale analysis at less than half of the cost of traditional all-flash solutions, while being many times faster.”

We think VAST is referring to systems from suppliers such as NetApp and Pure Storage with the comment about traditional all-flash solutions.

There should be no array resource contention with other workloads because the “dedicated QoS for Dremio’s sub-engines ensures demanding analytics jobs don’t prevent queries or dashboards from loading.” The sub-engines include an XL engine for exploratory analytics, an M engine for marketing, and an L engine for exec dashboards.

With VAST intending to port its software to the public cloud, we can look forward to Dremio users accessing VAST storage there too, and hopefully getting faster than native public cloud file or object access.

Check out a downloadable Solution Brief document for more information.