IBM touts Ceph for data lakehouses, generative AI

It’s a year since IBM integrated Red Hat’s Ceph storage product roadmaps and wants us to know that it’s making progress in this increasingly AI-dominated environment.

Gerald Sternagl, IBM
Gerald Sternagl

A blog going live today written by Gerald Sternagl, manager of Technical Product at IBM Storage Ceph, says: “This self-healing and self-managing platform is designed to deliver unified file, block, and object storage services at scale on industry standard hardware. Unified storage helps provide clients a bridge from legacy applications running on independent file or block storage to a common platform that includes those and object storage in a single appliance.

”Software-defined storage has emerged as a transformative force when it comes to data management, offering a host of advantages over traditional legacy storage arrays including extreme flexibility and scalability that are well-suited to handle modern uses cases like generative AI.”

Sternagl is critical of IBM’s legacy storage array hardware, such as its mainframe DS8000 and x86 server FlashSystem arrays, but he was a Red Hat vet with more than 10 years service before IBM acquired the company.

In his view: “Ceph is optimized for large single and multisite deployments and can efficiently scale to support hundreds of petabytes of data and tens of billions of objects, which is key for traditional and newer generative AI workloads.” It can support data lakehouse and AI/ML open source frameworks and “more traditional workloads such as MySQL and MongoDB on Red Hat OpenShift or RedHat OpenStack.”

There is “a feedback loop where generative AI thrives on the abundance of unstructured data, and the continuous generation of realistic data by AI further enriches and refines your understanding of unstructured datasets, fostering innovation and advancements.”

Some 768 TiB of raw Storage Ceph capacity is included in watsonx.data, IBM’s data lakehouse architecture for data, analytics, and AI workloads.

Sternagl says: “Organizations … need a storage management solution capable of accelerated data ingest, data cleansing and classification, metadata management and augmentation, and cloud-scale capacity management and deployment, such as software-defined storage.” It also needs to support both the on-premises and public cloud environments.

By software-defined storage, he means Ceph of course. His company is not promoting MinIO, Cloudian, Scality, DataCore, or WekaIO here.

In December, IBM updated Ceph with object lock immutability for ransomware protection and previews of NVMe-oF and NFS support for data ingest into the underlying Ceph object store.

Comment

An issue with any three-way combination of block, file, and object storage is that each access protocol implementation has to be aware of the others and this can delay and possibly limit the adoption of new features, such as NVMe-oF and NFS support. If you need all three protocols supported in a single software package then Ceph is a good choice but you may find that block-only, block and file combined, file-only, or object and file combined may support new features faster and also provide speedier data access.