IBM has unveiled its watsonx.data datastore using lakehouse uderpinnings to run multiple query engines for AI and analytics workloads, and claims this can cut data warehouse costs by up to 50 percent.
This datastore is the core part of an overall watsonx AI and data platform launched at the IBM Think event, and include an AI development studio. IBM says watsonx.data’s lakehouse bridges the gap between data warehouses and data lakes, offering the flexibility of a data lake with the performance and structure of a data warehouse.
The announcement comes bundled with quotes from Intel and Cloudera but not, oddly, IBM. This is a partnership-focused release.
Das Kamhout, VP and Senior Principal Engineer of the Cloud and Enterprise Solutions Group at Intel, said: “We recognize the importance of watsonx.data and the development of the open-source components that it’s built upon. We look forward to partnering with IBM to optimize the watsonx.data stack, achieving breakthrough performance through our joint technological contributions to the Presto open-source community.”
watsonx.data runs on-premises or in public clouds like AWS. The lakehouse underneath can contain both structured and unstructured data. It can support open data formats, such as Apache Parquet and Avro, and table formats like Apache Iceberg. This is open source software for enabling SQL commands to work on petabyte-scale analytic tables. Underneath this can be object storage.
The platform is intended to be a single point of entry to the lakehouse and provide access to multiple query engines such as Presto, Spark and Meta’s Velox open source unified execution engine acceleration library.
Presto, the in-memory distributed SQL datalake query engine, has a starring role here, building on IBM’s acquisition of Ahana in April.
IBM says watsonx.data offers built-in governance, automation, observability and integrations with an organization’s existing databases and tools to simplify setup and user experience. It is engineered to use Intel’s built-in accelerators on Intel’s new 4th Gen Xeon SP CPUs.
IBM’s tech partners are at the fore here. Paul Codding, EVP of Product Management of Cloudera, said: “IBM and Cloudera customers will benefit from a truly open and interoperable hybrid data platform that fuels and accelerates the adoption of AI across an ever-increasing range of use cases and business processes.”
Soo Lee, Director Worldwide Strategic Alliances at AWS, said: “Making watsonx.data available as a service in AWS Marketplace further supports our customers’ increasing needs around hybrid cloud – giving them greater flexibility to run their business processes wherever they are, while providing choice of a wide range of AWS services and IBM cloud native software attuned to their unique requirements.”
But watsonx.data is not yet available in the AWS Marketplace. We checked:
IBM says watsonx.data integrates with StepZen, Databand.ai, IBM Watson Knowledge Catalog, IBM zSystems, IBM Watson Studio, and IBM Cognos Analytics with Watson. IBM says these integrations enable watsonx.data users to implement various data catalog, lineage, governance, and observability offerings across their data ecosystems.
The watsonx.data roadmap includes incorporating the latest performance enhancements to Presto via Velox and Ahana. It will also incorporate IBM’s Storage Fusion technology to enhance data caching across remote sources as well as semantic automation capabilities built on IBM Research’s foundation models to automate data discovery, exploration, and enrichment through conversational user experiences.
A diagram in an IBM watsonx.data ebook shows multiple query engines accessing a metadata store, underneath which is an object store with links to structured, data warehouse, semi-structured, unstructured and data lake data.
There is no mention from IBM about the types of datalakes and lakehouses that are supported; Dremio is not identified, for example.
IBM claims watsonx.data will extend its market leadership in data and AI, but there is no word in IBM’s announcement of using ChatGPT-like large language models.
watsonx.data is in a closed beta phase and expected to be generally available in July 2023. Download an ebook here. It won’t tell you much more but you’ll get a flavor of IBM’s thinking.