Legacy systems fade as arrays shift to extreme scale and parallelism for AI

Analysis: The storage array industry is undergoing a massive pivot to extreme scale and parallel, multi-protocol data delivery for AI training and inferencing, with dual-controller arrays and scale-out filer clusters being left behind as legacy technology.

The storage needs of AI training butted up against HPC’s parallel file system technology, led by DDN’s ExaScaler and IBM’s Storage Scale, and found it wanting because AI training shops did not want to learn parallel file system intricacies, needing access to basic file data and then object data. They wanted low access latency, meaning all-flash systems, not disk-based, fast, large, and small file performance and very high capacity, beyond the hundreds of petabytes and extending to exabyte levels.

VAST Data was one of the leaders of the charge, announcing its technology six years ago. Since then it has built up a lead in supplying storage and an AI data stack (Data Space, Data Base, Data Engine, Insight Engine) to AI training companies such as xAI and GPU cloud operations such as CoreWeave and Lambda.

Another leader was WEKA with its parallel-capable WekaFS, using standard NFS and SMB, delivering file and then S3 object data faster than scale-out filer suppliers such as Dell’s PowerScale and Qumulo, until Qumulo eventually caught up in the cloud. Dell was early with GPUDirect support, adding it to PowerScale in 2021. NetApp followed in April 2023. Hitachi Vantara announced GPUDirect support in March last year. GPUDirect support has become table stakes but it is not enough on its own to provide full AI storage capabilities.

Hammerspace added to the incumbent’s distress with its data orchestration technology. This, combined with its GPUDirect support, parallel NFS support, and embrace of GPU server’s local, tier zero, SSD storage, meant it could ship data very fast to GPU servers from relatively slow dual-controller file arrays and any other NAS and object storage as well, treating it as a universal data space.

The success achieved by VAST, WEKA, and Hammerspace presented a problem to the incumbent file and object array and parallel file system suppliers. In response, NetApp announced an ONTAP Data Platform for AI project. Dell said it would parallelize PowerScale. HPE OEM’d VAST Data file technology and developed its own Alletra Storage MP disaggregated compute and storage hardware.

DDN announced its Infinia software providing fast access to block, file, and object in late 2023 with a v2.0 update in February this year claiming an up to 100x improvement in AI data acceleration and 10x gain in datacenter and cloud cost efficiency. This was, in effect, a recognition that its Lustre parallel file system-based ExaScaler technology faced limitations and a new approach was needed. 

Our understanding of DDN’s Infinia architecture

Huawei launched its A800 AI storage system in May 2024, saying it had a scale-out architecture with a separated data and control plane and an OceanFS high-performance parallel file system supporting NFS, SMB, HDFS, S3, POSIX, and MP-IO. The A800 can provide 100 million IOPS and petabytes-per-second bandwidth. This will not affect North American organizations but will feature in the rest of the world.

Pure Storage announced its FlashBlade//EXA last week and its announcement material identified three technology phases for fast file and object access, starting with Lustre-type parallel file systems:

This separates file metadata from the underlying object data to provide a two-layer system: object data nodes and separate metadata nodes. Access client systems would find out from the metadata nodes where the data they wanted was stored, with files being striped across many data nodes, and then multiple data nodes would pump their parts of the file in parallel to speed delivery. Pure says this runs into problems when there are many small files because the metadata nodes become bottlenecks. Also, the client system software is complicated.

The next phase was to have both the metadata and the data stored in data or storage nodes with separate and scale-out compute nodes doing the data access calculations – the VAST-style approach:

In its initial marketing material, VAST said there could be up to 10,000 stateless compute nodes and 1,000 data nodes, emphasizing the enlarged capacity it was offering. Pure identified problems with this as well, saying that there can be write bottlenecks on the data nodes providing variable performance and that network complexity could be an issue as well.

Step back a moment and reflect that Pure Storage is now an incumbent with many FlashBlade customers and needs to introduce a VAST-type disaggregated compute and storage technology without leaving its customer base behind. In a stroke of genius, co-founder John Colgrove decided to have separate metadata and data storage nodes, roughly similar to Lustre, but make FlashBlade arrays the metadata nodes:

Pure’s Fusion, with its fleet-level global storage pools, could move existing FlashBlade data to the EXA’s data nodes. These are simple JBOFs, using 24 of Pure’s proprietary Direct Flash Modules and their 75 TB and 150 TB capacity, with 300 TB and then greater capacities coming later. They provide relatively low-cost, high-density storage.

Pure says that accessing client systems, such as GPU servers, have simpler agent software and get consistent write performance at scale. The EXA system metadata nodes communicate with compute cluster clients using pNFS (NFSv4.1 over TCP) with data transmitted using NFSv3 over RDMA. 

The EXA system scales to exabytes and delivers more than 10 TBps of bandwidth, with 3.4 TBps from a single rack. Its general availability will be in the summer, with S3 over RDMA, Nvidia certification and Fusion integration coming after that.

Now Pure has an AI training-capable storage system and an answer to DDN’s Infinia, Hammerspace, HPE with its Alletra Storage MP, Huawei’s A800, VAST Data, and WEKA.

VDURA will deliver RDMA and GPUDirect optimizations later this year. Object storage supplier MinIO has announced support for S3 over RDMA, while Cloudian and Scality have also announced fast object delivery to Nvidia GPU servers.

That leaves four storage suppliers waiting in the wings: Dell with its future PowerScale parallelism and NetApp with its ONTAP for AI project both yet to deliver the goods. Qumulo has not committed to deliver GPUDirect support, although it has said it can do so quite quickly. Nor has Infinidat. Once Infinidat has been absorbed by Lenovo, it might support GPUDirect alongside its existing RAG workflow deployment architecture for generative AI inferencing workloads.

We should note that Dell has been energetic in supporting AI workloads with its servers and its AI Factory initiatives.

Apart from these four, the rest of the mainstream incumbent file and object storage suppliers have all substantially reshaped their technologies to support generative AI’s need for extreme, exabyte-level storage capacity, RDMA levels of latency, and parallel-style read and write data access.