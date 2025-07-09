IBM’s developers are adding features to its watsonx.data and other parts of its watsonx AI software to help its customers build and operate AI agents that get faster responses to data store requests and comply with governance monitoring.

IBM’s watsonx.data is a datastore with lakehouse underpinnings, that runs multiple query engines, such as Presto, Spark and Meta’s Velox, for AI and analytics workloads. It’s part of an overall watsonx AI and data platform launched at May 2023 IBM Think event. The watsonx.data code runs on-premises or in public clouds like AWS. Its lakehouse part can store both structured and unstructured data, and supports open data formats, such as Apache Parquet and Avro, and table formats like Apache Iceberg.

Think of watsonx.data having its query engines access a metadata store which is layered above a object store. This could be IBM’s Storage Scale, Ceph, AWS, Azure or GCP object stores. There are separate, disaggregated, compute and storage layers.

In May we learned that a watsonx.data development will bring together an open data lakehouse with data fabric capabilities, like data lineage tracking and governance, to help customers unify, govern, and access data across silos, formats, and clouds.

The latest v2.2 watsonx.data release is part of an overall watsonx AI data platform update with three components;

watsonx.ai enterprise-grade AI studio to operationalize and scale the development of AI apps with trad machine learning and Gen AI capabilities and data

watsonx.data open, hybrid, and governed data store

watsonx.governance to direct, manage, and monitor your organization’s AI activities with responsibility, transparency, and explainability

The watsonx.ai component gets enhancements to its AutoAI RAG feature including support for:

Elastic search vector store

Multiple correct answers for each question in an evaluation data asset

Multilingual models to match input language for experiments

Visibility of the importance of each setting for creating and ranking the optimized patterns

Leverage imported custom foundation models for experiments

Leverage a hybrid search strategy to retrieve content from indexed documents

There’s more. Users – clients in IBM-speak – can simplify business documents by using the new version of the watsonx.ai Text Extraction API with support for additional document formats such as Microsoft Word and PowerPoint, HTML, and various image formats. Imported custom foundation models can include models that belong to the Tiny Time Mixer (TTM) family which will be accessible through the time series API. There are parameter-efficient fine-tuning (PEFT) techniques in the Tuning Studio for Low-Ranking Adaptation (LoRA) and Quantized Low-Ranking Adaptation (QLoRA), plus a synthetic data generation method to create unstructured text datasets for model tuning and evaluation.

Clients can also deploy and test AI service templates and applications locally with the new Command Line Interface (ibm-watsonx-ai-cli). New foundation models are supported: granite-3-3-8b-instruct, meta-llama/llama-4-scout-17b-16e-instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 and mistral-small-3-1-24b-instruct-2503.

V2.2 watsonx.data gets an IBM watsonx.data Premium Edition for an an integrated user experience to manage the unstructured and structured data for the AI lifecycle. A Data Workbench, built on integrated development environment (IDE) concepts, centralizes features and capabilities of the Data Manager and Query Manager. The Gluten component delivers optimizations on current Spark’s performance for native C++.

Apache Gluten is an open-source project intended to improve Apache Spark performance by offloading compute-intensive SQL query execution from the Java Virtual Machine (JVM) to native C++ engines.

There are defined policies and rules to scale-in and scale-out the Milvus vector database engine. A Data Product Hub integration enables data producers to create/store the Data Product Asset into watsonx.data’s store and share with their data consumer.

Lastly, interconnection with watsonx Business Intelligence analytics agent executes queries using natural language processing (NLP).

The watsonx.governance element gets capabilities for agentic AI with the ability to leverage workflow to review and evaluate Agents prior to onboarding them into an organization’s inventory. There is an an updated dashboard with metrics related to Agents and additional capabilities related to the evaluation of agentic AI are also initially available with the SaaS delivery option.

IBM v2.2 watsonx release notes can be found here.

Comment

IBM has developed watsonx.data separately from its Storage Scale and Ceph storage products. This is a different strategy from Cloudian, WEKA and VAST Data, who are erecting AI pipeline SW stack elements on top of their storage products with, for example, Cloudian adopting the Milvus vector database and VAST Data developing a full scale AI OS.

We look at all these IBM incremental line-item watsonx updates and marvel at the growing complexity of the AI agent stack. If AI agents are all they are cracked up to be, couldn’t they deal with this complex AI agent development plumbing?