What’s behind YanRong’s MLPerf benchmark? Parallel file system software, GPUDirect, and PCIe 5 SSDs

Chinese server and storage supplier YanRong Technologies suddenly showed up on many radars when it appeared in the MLPerf v1 storage benchmark results with a high score a few weeks ago, and that warrants a closer look.

Haitao Wang

YanRong was founded in 2016 by CEO Haitao Wang, an ex-IBMer. It is a Beijing-based business which raised a seed round in 2018, a CN¥50 million ($4.2 million) A-round in 2019, and a CN¥200M ($28 million) B-round this year. It produces products such as the F9000X all-flash array and YRCloudFile software. The biz saw 80 percent year-on-year growth in the first 2024 quarter according to an IDC CHina SDS market report. The report adds it’s at the top six position in the global IO500 Storage List for high-performance computing.

An F9000X datasheet  says it is a scale-out server equipped with 5th gen Xeon CPUs, and a PCIe 5 bus hooking that up to U.2 format NVMe SSDs and Nvidia ConnectX-7 host channel adapters. The DRAM capacity range is not specified though.

It supports 400Gbit/sec InfiniBand and 400GbE RoCE networking, and Nvidia’s GPUDirect protocol for direct GPU server memory to storage drive data access.

YanRong F9000X

The networking side can operate TCP and RDMA access at the same time. It supports bandwidth performance aggregation across multiple networks through multi-channel functionality. The system supports POSIX, NFS, SMB and CSI with Linux/Windows POSIX clients.

The F9000X can have up to 22 NVMe SSDs used for data storage, with 7.68TB, 15.36TB, 30.72TB, etc. capacities. Each node has 2x 1.6TB NVMe SSDs used as metadata disks and also 2x 480GB SATA SSDs for the operating system software. Both data nodes and metadata nodes can be expanded as needed.

It achieves up to 260GB/sec in a three-node cluster and 7.6 million IOPS.

The OS provides two replicas and uses erasure coding for data protection. There is no snapshot functionality – a point which is remarked upon in Gartner Peer Insights about the system.

Other suppliers can ship storage servers with similar hardware capabilities, such as Supermicro. YanRong squeezes a lot of  performance from the hardware with its YRCloudFile software.

This software provides a parallel filesystem with a global namespace and has optimizations for better RDMA performance in data/task affinity, cache alignment, request encapsulation, and page locking. The system can support data accesses by thousands of clients simultaneously. YanRong says that, as the number of nodes increases, both storage capacity and performance grow linearly.

We’re told that the YRCloudFile Windows client can achieve three times the I/O performance of SMB (compared to industry-standard SMB protocols) under the same configuration and network conditions.

The data and metadata nodes can be scaled independently to suit different workloads. A node’s storage media and networking can be configured to suit particular workloads as well.

The YRCloudFile software is decoupled from any particular hardware. It can run on the F90000X, any standard x86 server or on a public cloud server instance, and it’s available in the AWS marketplace.

The software can automatically and transparently move cold data to object storage and supports loading data from its own drives and cloud object storage for applications. The mapping between bucket and directory integrates “seamlessly with object and file storage for efficient data transfer.”

YRCloudFile should be compared to DDN’s Lustre, IBM’s Storage Scale, Vdura’s PanFS and WEKA’s parallel filesystem software offerings. There is a YRCloudFile Distributed File System brochure available but it requires a registration process with YanRong sending download details to you via email.

Bootnote

The F9000X’s dimensions, without a security panel, are 87.5mm high x 445.4mm wide x 780mm deep. The depth is 808mm with the security panel.