IBM boosts Storage Scale with content-aware AI integration

IBM Storage Scale, the longstanding parallel file system, now includes content-awareness to enhance its retrieval-augmented generation (RAG) capability.

Storage Scale has a substantial history in high-performance computing (HPC) and enterprise use cases requiring similar high-speed file data access, including large language model (LLM) training and inference. Storage Scale v5.2.1 added high-performance S3 object storage support. The product also supports Nvidia’s GPUDirect host server memory bypass protocol. In September last year, IBM combined its watsonx data lakehouse with Storage Scale to provide object storage facilities beneath a file access overlay. The aim was to improve AI training and inference workload speed.

Now IBM has expanded on this collaboration with Nvidia, announcing:

  • Content-aware Storage Scale (CAS) capability building on Nvidia’s AI-Q blueprint and NeMo Retriever microservice.
  • Expanded watsonx integrations via support for Nvidia NIM microservices for external AI model integration across multiple cloud environments. IBM watsonx.governance will allow enterprises to monitor and govern NIM microservices across any hosting environment.
  • AI-focused IBM Consulting capabilities, with Nvidia providing AI integration services via Nvidia Blueprints, optimizing compute-intensive AI workloads across hybrid cloud environments using Red Hat OpenShift and Nvidia AI.
  • Nvidia H200 GPU instance availability on IBM Cloud. 
  • Storage Scale support for Nvidia’s AI Data Platform reference design.
Vincent Hsu, IBM
Vincent Hsu

IBM Fellow, VP, and CTO Vincent Hsu wrote about CAS, saying it is based on work by IBM Research and Nvidia. Because IBM now has “embedded compute, data pipelines, and vector database capabilities within the storage system, CAS reduces data movement and latency to increase efficiency.”

Storage Scale can obtain data from many sources. It has “global data abstraction services engineered to provide connectivity from multiple data sources and multiple locations to bring data into your AI factory from IBM and third-party storage environments.”

CAS can, using AI-Q and NeMo Retriever, “extract information from text, charts, graphs and even images.” It is able to “watch folders on other storage systems or in the cloud to identify changes as they occur, automatically run pre-built pipelines, and update just the changes to the vector database, ensuring that data is current for AI applications efficiently using Nvidia accelerated computing.”

IBM Storage Scale will respond to queries using extracted and augmented data, accelerating communications between GPUs and storage via BlueField-3 DPUs and Spectrum-X networking.

CAS can detect changes in multiple unstructured data stores, convert the modified data into vectors, and feed it to AI models to improve RAG responses to user or AI assistant requests. Or as Hsu put it: ”By applying innovative natural language processing techniques, we’ve developed ways to much more efficiently extract the semantic meaning from all kinds of content, making it easier to update AI tools to improve the quality of their answers.”

Nvidia Storage Networking Technology VP Rob Davis said: “AI agents need to rapidly access, fetch and process data at scale, and today, these steps occur in separate silos. The integration of IBM’s content-aware storage with Nvidia AI orchestrates data and compute across an optimized network fabric to overcome silos with an intelligent, scalable system that drives near real-time inference for responsive AI reasoning.”

Storage Scale content-awareness facilities will be embedded in the next update of IBM Fusion generally available in the second quarter. Interested parties can learn more by registering here for the webinar “AI First Storage – Enhancing AI Results with Content-Aware Storage,” scheduled for April 8, 2025, at 1100 Eastern Time.