Chinese supplier YanRong Technology says it has the leading performance in the MLPerf Storage v1.0 benchmark, ranking first in the CosmoFlow, ResNet50 and UNet3D workloads.
The firm sells its all-flash storage F9000X with its high-performance YRCloudFile distributed file system. The F9000X uses NVMe SSDs, InbfiniBand and Ethernet RDMA networking and the GPUDirect host server CPU and DRAM-avoidance protocol to deliver up to 260GB/sec and 7.5 million IOPS in a three-node storage cluster.

The key result in the benchmark, according to MLCommons, is the number of A100 or H100 GPUs supported by a storage system, or device, at a minimum 90 percent utilization for each of three workloads: CosmoFlow, ResNet50 and UNet3D.
YanRong tested a 3-node F900X cluster on all three workloads.

It said that, on a single host, the F9000X can sustain up to:
- 60x H100 accelerators training CosmoFlow at 34GB/sec
- 188x H100 accelerators training ResNet50 at 37GB/sec
- 20x H100 accelerators training UNet3D at 58GB/sec
In tests with three hosts, YanRong F9000X can sustain up to:
- 120x H100 accelerators training CosmoFlow at 72GB/sec
- 540x H100 accelerators training ResNet50 at 103GB/sec
- 60x H100 accelerators training UNet3D at 169GB/sec
It claimed that, in the distributed training cluster scenario, YanRong Tech ranked first in both the average number of ACCs (Accelerators, meaning GPUs) and storage bandwidth performance supported per compute node across all three model tests, “clearly establishing its unquestionable leadership position in AI storage.”
It presented charts showing this alongside the equivalent ratings for DDN, Hammerspace, Huawei, IEIT Systems and WEKA.


This is, to emphasize the point, average number of GPUs and storage bandwidth performance supported per compute node – not the actual benchmark performance number where, for example, Huawei and others scored more highly than YanRong.
We charted the number of GPS supported in the Unet3D workload by all the suppliers and their submitted systems to show this.

YanRong Technology submitted a three-node cluster for its highest results. More nodes could added to support a higher number of GPUs. The more GPUs supported per storage node, the fewer nodes will be needed to support a specific number of GPUs.
YanRong also says its system’s “bandwidth performance maintained a significant linear growth capability” as the number of GPUs increased.
It believes AI and machine learning workloads are set to grow exponentially, and is working on improvements for the future – intending to ensure that storage bottlenecks are eliminated.