IBM’s lowercase watsonx.data capitalizes on lakehouses

Published thu 11 May 2023 // 13:17 UTC

IBM has unveiled its watsonx.data datastore using lakehouse uderpinnings to run multiple query engines for AI and analytics workloads, and claims this can cut data warehouse costs by up to 50 percent.

This datastore is the core part of an overall watsonx AI and data platform launched at the IBM Think event, and include an AI development studio. IBM says watsonx.data’s lakehouse bridges the gap between data warehouses and data lakes, offering the flexibility of a data lake with the performance and structure of a data warehouse.

BANDF AD

The announcement comes bundled with quotes from Intel and Cloudera but not, oddly, IBM. This is a partnership-focused release.

Das Kamhout, VP and Senior Principal Engineer of the Cloud and Enterprise Solutions Group at Intel, said: “We recognize the importance of watsonx.data and the development of the open-source components that it’s built upon. We look forward to partnering with IBM to optimize the watsonx.data stack, achieving breakthrough performance through our joint technological contributions to the Presto open-source community.”

watsonx.data runs on-premises or in public clouds like AWS. The lakehouse underneath can contain both structured and unstructured data. It can support open data formats, such as Apache Parquet and Avro, and table formats like Apache Iceberg. This is open source software for enabling SQL commands to work on petabyte-scale analytic tables. Underneath this can be object storage.

The platform is intended to be a single point of entry to the lakehouse and provide access to multiple query engines such as Presto, Spark and Meta’s Velox open source unified execution engine acceleration library.

BANDF AD

Presto, the in-memory distributed SQL datalake query engine, has a starring role here, building on IBM’s acquisition of Ahana in April.

IBM says watsonx.data offers built-in governance, automation, observability and integrations with an organization’s existing databases and tools to simplify setup and user experience. It is engineered to use Intel’s built-in accelerators on Intel’s new 4th Gen Xeon SP CPUs.

IBM’s tech partners are at the fore here. Paul Codding, EVP of Product Management of Cloudera, said: “IBM and Cloudera customers will benefit from a truly open and interoperable hybrid data platform that fuels and accelerates the adoption of AI across an ever-increasing range of use cases and business processes.”

Soo Lee, Director Worldwide Strategic Alliances at AWS, said: “Making watsonx.data available as a service in AWS Marketplace further supports our customers’ increasing needs around hybrid cloud – giving them greater flexibility to run their business processes wherever they are, while providing choice of a wide range of AWS services and IBM cloud native software attuned to their unique requirements.”

BANDF AD

But watsonx.data is not yet available in the AWS Marketplace. We checked:

IBM says watsonx.data integrates with StepZen, Databand.ai, IBM Watson Knowledge Catalog, IBM zSystems, IBM Watson Studio, and IBM Cognos Analytics with Watson. IBM says these integrations enable watsonx.data users to implement various data catalog, lineage, governance, and observability offerings across their data ecosystems.

The watsonx.data roadmap includes incorporating the latest performance enhancements to Presto via Velox and Ahana. It will also incorporate IBM’s Storage Fusion technology to enhance data caching across remote sources as well as semantic automation capabilities built on IBM Research’s foundation models to automate data discovery, exploration, and enrichment through conversational user experiences.

A diagram in an IBM watsonx.data ebook shows multiple query engines accessing a metadata store, underneath which is an object store with links to structured, data warehouse, semi-structured, unstructured and data lake data.

There is no mention from IBM about the types of datalakes and lakehouses that are supported; Dremio is not identified, for example.

IBM claims watsonx.data will extend its market leadership in data and AI, but there is no word in IBM’s announcement of using ChatGPT-like large language models.

watsonx.data is in a closed beta phase and expected to be generally available in July 2023. Download an ebook here. It won’t tell you much more but you’ll get a flavor of IBM’s thinking.

ai-ml object presto watson public cloud ibm

IBM’s lowercase watsonx.data capitalizes on lakehouses

Kioxia eyes GPU memory stack with high-bandwidth flash push

Future of Memory and Storage event sold to Terrapinn

Veeam's Securiti play could push it past Cohesity

Micron’s massive memory money making machine

Storage vendors orbit the Nvidia sun at GTC

HPE adds Blackwell, Rubin systems to Nvidia-backed AI push

DDN, Nvidia team up to cut inference costs and boost GPU utilization

Nvidia GTC storage news roundup

Agentic AI Is forcing analytics and operations to converge

Everpure dives deeper into AI

Dell’s AI story electrified by Lightning

Women get data-driven health boost as FA tackles sports science's male bias

Storage news ticker - 13 March

Everpure tops SPECstorage Solution 2020 AI IMAGE benchmark charts

Solidigm strikes out in new AI computer vision direction

Lightbits and ScaleFlux demo 100x to 280x KV Cache acceleration

Qdrant pockets $50M to push composable vector search

VAST Data raises $1B at $30B valuation as AI storage demand surges

HPE networking boom offsets server dip as revenue hits $9.3B

Cohesity builds guardrails for rogue AI agents and their data access

VDURA pairs V5000 flash with WD Data60 and Data102 disk shelves

Everpure stretches ActiveCluster to metro-distance DR for file workloads