Dremio pushes further into data warehouse land

Data lake supplier Dremio has updated its software to add yet more functionality it says is found in data warehouses.

Dremio supplies open source-based data lake software, saying that data warehouses are too restrictive, relying as they do on extract, transform and load (ETL) procedures to get data from different sources into the warehouse where it can be analyzed. The alternative view is that data from the sources should be amalgamated into a single data lake with a more direct connection to analytics processing, and a less costly data storing and analytics function set.

Tomer Shiran, co-founder and CPO, said that Dremio’s faciities now “span the full spectrum of data warehousing use cases, enabling companies to shift their strategy from a proprietary and expensive data warehouse to a flexible and open data lakehouse.” 

The new functionality that Dremio has added to its data lake software includes:

DML – support for Data Manipulation Language (DML) operations (INSERT, UPDATE, DELETE) on Apache Iceberg tables (Iceberg being open source software for enabling SQL commands to work on petabyte-scale analytic tables).

Time travel – in-place querying of historical data. Dremio says this provides functionality previously only found in database and data warehouse technologies. 

New SQL functionality – Dremio’s software and Dremio Cloud now have a semi-structured MAP data type allowing querying of map data from Apache Parquet files, Apache Iceberg, and Delta Lake. There are MERGE statement and FROM clause improvements, and improvements to scalar SQL User-Defined Functions (UDFs), tabular UDFs, Listagg, QUALIFY clause, and LIKE ANY/ALL/SOME statements.  

Security enhancements – row and column-level, policy-defined access control for users, new role-based access control (RBAC) privileges for admin operations, and encryption for a project store (S3 buckets) with customer-managed keys. 

Performance improvementsGraviton2 support for customers within AWS, and spillable hash join functionality. This means a join operator can spill to disk, when the build-side of a join operator does not fit in memory. 

Usability updates – a functions list provides users with a searchable list of supported SQL functions and the syntax and description of each. Function syntax from this component can be added to the SQL runner with one click. 

SSO – Dremio has added Single Sign-On (SSO) functionality with Tableau (Salesforce) and Power BI (Microsoft). This delivers granular access control and visibility into consumption. 

It now has a partner-validated connector with dbt (data build tool) so Dremio users can build data pipelines using SQL. Dremio has built native connectors for Snowflake, MongoDB, DB2, OpenSearch and Azure Data Explorer, meaning Dremio users can query data in these repositories. Finally it has added drivers for Arrow Flight ODBC and JDBC. ODBC is the Open Database Connectivity API and JDBC is the Java Database Connectivity API.