Dell enhances data lakehouse with faster query speeds and more

Dell has boosted the query speed of its data lakehouse, added and upgraded connectors, and improved monitoring and security.

In March, Dell announced a data lakehouse element of its AI portfolio that uses the Starburst Trino query engine, Kubernetes-organized lakehouse system software, and scale-out S3-compatible object storage based on Dell’s ECS, ObjectScale, or PowerScale storage products.

Starburst introduced its Warp Speed technology, with Apache Lucene indexing and caching technology, in February last year, claiming it could accelerate text-based query processing by up to 7x. It’s now come to Dell’s data lakehouse along with more connectors and other improvements.

Starburst Warp Speed diagram
Starburst Warp Speed diagram

Dell product manager Vrashank Jain writes: “Warp Speed is a new feature in the Dell Data Lakehouse that autonomously learns query patterns and identifies frequently accessed data to create optimal indexes and caches while keeping infrequently accessed data where it is.”

It can accelerate query performance and “between 3x and 5x for the top 20 percent of queries.“

Vrashank Jain, Dell
Vrashank Jain

No data engineering is required to have a data lake autonomously indexed and higher-performance dashboards can be built and populated with the query acceleration. The autonomous indexing “creates appropriate index types (bitmap, dictionary, tree) tailored to each data block, accelerating operations such as joins, filters, and searches. Indexes are stored on an SSD in the compute nodes for rapid access.”

Jain writes: “Smart caching is a proprietary SSD columnar block caching that optimizes performance based on frequency of data usage. Caching eliminates unnecessary table scanning and provides more reuse of data between queries thus saving compute costs.

“With Warp Speed, the same cluster can run data lake queries 3x to 5x faster without requiring any change in the query by the end user. It can also help reduce cluster sizes by up to 40 percent.” Customers can either run more queries on large clusters or run the same volume of queries on smaller clusters.

According to Jain, the Warp Speed feature “is only supported on data lakes that reside on Dell S3-compatible storage.”

Dell has added more enhancements to its data lakehouse:

  • Support for connecting to an existing Hive Metastore via Kerberos, enabling seamless metadata operations and enhanced data governance.
  • A Neo4j graph database connector is in public preview, and there is an improved Snowflake parallel connector for more efficient querying. 
  • Upgraded connectors to Iceberg, Delta Lake, Hive, Db2, Netezza, RedShift, SAP HANA, Snowflake, SQL Server, Synapse and Teradata. These faster and more capable connectors perform operations such as join push down and data type handling.
  • PowerScale and ObjectScale storage systems are fully validated. 
  • Dell support teams can now work on a health check to assess the state of a customer’s cluster before or after an install or upgrade using an automated health check. The health check is crucial to ensuring zero downtime.
  • The Data Lakehouse can now send critical system failure alerts directly to Dell support teams for proactive handling of failure states or pending failure conditions.
  • Optional end-to-end encryption for internal components, including all the compute nodes, cache service and the meta store. However, this feature will impact performance and thus should be considered when sizing the cluster to meet performance SLAs.  
  • A five-year software subscription option in addition to existing one and three-year subscriptions, which will help align the lengths of hardware and software support terms to ease procurement.
  • Wider global availability with shipping to more countries across Europe, Africa, and Asia.

Warp Speed is included with existing Dell Data Lakehouse licenses. The configuration of the compute nodes will be modified to include SSDs that have been tested and benchmarked by Dell to support the Warp Speed index and cache.

Prospective customers can access Dell’s Data Lakehouse in a Dell Demo Center and soon in the Customer Solution Center for interactive exploration and system validation. Customers and partners can get started by creating a free account in the Demo Center.