Scality: AI-focused RING XP runs circles around slow object storage

Scality’s all-flash RING XP provides GPUDirect class object storage for AI workloads, enabling it to compete with file systems in pumping data to GPU servers.

The RING is Scality’s core object storage software and it has been loaded with optimizations to provide object storage with, we’re told, microsecond response-time latencies for small object data, making it ideal for AI model training and fine-tuning. Scality claims it is 10–20X faster than Amazon S3 Express One Zone and 50–100X faster than Amazon S3 in terms of latency, meaning microsecond speed instead of milliseconds. GPUDirect is an Nvidia file storage access protocol enabling direct GPU server-to-NVME-storage-drive-access to lower access latency.

Giorgio Regni

Giorgio Regni, CTO and co-founder of Scality, said in a statement: “Object storage is a natural foundational repository for exabytes of data across the AI pipeline. With RING XP, we’ve not only optimized object storage for extreme performance but also reduced data silos — offering one seamless flexible technology for both long term data retention and ultra-high performance AI workloads.”

S3 Express One Zone’s latency to the first byte is in the single-digit milliseconds compared to S3 standard with double-digit millisecond latency, claims Scality. It says RING XP (eXtreme Performance) has:

  • AI-optimized object storage connectors to provide scale-out, fast access to storage from applications
  • Performance-tuned software that accelerates storage I/O throughout the storage stack
  • AMD EPYC-based all-flash NVMe storage servers from Lenovo, Supermicro, Dell and HPE with support for PCIe and NVMe, and the highest number of cores in single-socket CPUs to deliver optimal latencies

Scality claims the XP version of RING provides microsecond-level write (PUT) and read (GET) latency for 4KB objects. It can also provide storage for all AI pipeline stages, from ingesting massive datasets to model training and inference. RING XP has integrated lifecycle management covering these pipeline stages and a common framework of management and monitoring tools.

It also has integrated Scality CORE5 capabilities to boost security and data privacy.

We wanted to dive deeper into RING XP and here is how Regni and Scality CEO and founder Jérôme Lecat answered our questions:

Blocks & Files: How does RING XP achieve its higher level of performance?

Scality: RING XP achieves this new level of object storage performance for AI in a few key ways.

[We’re] streamlining the object API: the right tradeoff for AI applications is to eliminate unnecessary API features and heaviness in exchange for low latency and high throughput – as Amazon has shown the market with S3 Express One Zone. 

With that said, we believe that Amazon left its users with a weak tradeoff in S3 Express One Zone that doesn’t provide sufficient performance gains. With RING XP we’re going even further by simplifying the object stack and removing features that are commonly performed at a higher level in the stack. 

For example, every data lake management system (or data lake house) maintains an indexed view of all data across the entire dataset. That’s why we made the decision to remove object listing from RING XP since it would be an unnecessary duplication of functionality and effort. 

RING XP software implements several optimizations in our software that led us to power the largest email service provider platforms in the world, for processing 100K small object transactions per second.

Blocks & Files: Have you managed to get GPUDirect-like capabilities into Ring? Is there some way Ring XP sets up a direct GPU memory-to-SSD (holding the object data) connection?

Scality: RING XP currently achieves microsecond latencies even without Nvidia GPUDirect, as current libraries don’t yet support object storage directly. However, we have active R&D underway for GPUDirect compatibility with object storage, which we see as a major opportunity to further reduce RING XP’s latency.

GPUDirect for filesystems uses FIEMAP (the ioctl_fiemap() command), which retrieves the physical location of file data on storage by mapping file extents. This is valuable for direct-access applications like GPUDirect Storage, as it enables the GPU to access data blocks directly on disk or over the network during training or inference.

To achieve similar functionality, we’re developing an extension for the S3 protocol in RING and RING XP called ObjectMap. ObjectMap will provide the same extent-level data mapping as FIEMAP, but it will be native to our object storage system. 

Our goal is to publicly document ObjectMap to enable other object storage systems to adopt it, aiming to make it an industry standard and position object storage as the optimal solution for AI deployments.

It’s also worth noting that Nvidia’s MAGNUM-IO toolchain currently relies heavily on filesystem-based data, such as local storage or NFS. Moving forward, we anticipate the need for that toolchain to evolve and become more storage-agnostic.

Blocks & Files: Out of curiosity, does the Nvidia BaM concept have any relevance here?

Scality: Yes, Nvidia’s BaM concept is highly relevant. By enabling GPUs to directly access SSDs, BaM could allow object storage to effectively expand GPU memory, letting GPUs access object data directly. With a mapping layer similar to FIEMAP, BaM could make object storage a practical, high-speed extension for GPU memory, benefiting AI and analytics workloads that require rapid data retrieval​.

Blocks & Files: Is RING XP protected by patents? And I expect there is a RAG angle here as well. Perhaps the Ring can store vectors as objects?

Scality: Yes, RING XP is indeed protected by our core patent. Key elements include:

  • Low-latency DHT: the architecture builds a distributed hash table over the storage nodes, ensuring guaranteed, bounded access times without needing a central database. This eliminates bottlenecks typical in centralized architectures.
  • O(1) Access: In normal operations (no disruptions), access nodes cache the entire topology, enabling O(1) access to any object in the system, contributing to our low-latency performance.
  • Late Materialization: this technique enables network operations to first hit memory or fast metadata flash for a high percentage of IO requests. This optimizes performance (latency) by shielding the disk drives used to store data for all non-data operations. With NVMe flash at the storage layer, RING XP maintains microsecond response times through the full IO stack.

We’re excited to push object storage into new territory with RING XP, and we have patents pending to protect this innovation further, around optimizing on drive data layout for GPUs and vector data co-location. 

On storing vectors, absolutely – RING is designed to handle vector data as objects, making it ideal for RAG (Retrieval-Augmented Generation) applications and AI workloads requiring rapid access to complex datasets.

As an object store, RING already provides a comprehensive capability for storing user-/application-extended metadata along with the object data, and this will be an ideal mechanism for storing vectors.