Western Digital claims its OpenFlex Data24 system loaded with NVMe SSDs and an NVMe over Fabrics (NVMeoF) adapter does read and write IO across an Nvidia GPUDirect link faster than NetApp’s ONTAP or BeeGFS arrays.
The OpenFlex Data24 is a 2U x 24-drive slot enclosure introduced in 2020. The upgraded 3200 series was launched in August last year. It featured dual-port SSDs and a RapidFlex fabric bridge, supporting NVMeoF RDMA across RoCE and TCP for improved performance. Western Digital has published an Nvidia GPUDirect Storage technical brief document benchmarking GPUDirect and the OpenFlex Data24 system.
As GPUDirect bypasses a server’s host CPU and DRAM, it enables direct read and write operations to and from the server’s NVMe SSDs, as well as to and from the Data24’s SSDs via its fabric bridge. The Data24 is just another disaggregated storage system in this sense.
Typically, Nvidia GPUs are fed data either from parallel file systems, or from NVMeoF-supporting arrays such as those from NetApp, Pure Storage, and VAST Data. Note that VAST’s array architecture provides parallel file system performance.
The Western Digital tech brief provides detailed configuration data to show that it is a valid GPUDirect data source:
Its benchmarked GPUDirect IO performance is 54.56 GBps read bandwidth and 52.6 GBps write bandwidth. B&F has tracked GPUDirect storage performance from various suppliers and can compare the OpenFlex Data24 performance on a per-node basis with them:
Readers can see in B&F‘s chart that there are two groups of higher-performing systems, above 62.5 GBps bandwidth. To the left there are DDN and IBM with their parallel file system software – Lustre and Storage Scale respectively. On the right is WekaPOD, with its parallel Data Platform file system, and three PEAK:AIO results. These come from PCIe gen 5-supporting servers and PEAK:AIO’s rewritten NAS software.
The Western Digital OpenFlex scores are unusual in that there is near equality between the read and write numbers. They beat NetApp ONTAP and BeeGFS numbers, and Pure Storage’s write bandwidth but not its read bandwidth. They also beat VAST Data and WEKA write bandwidth, but lag behind a lot with WEKAPOD reads and slightly with VAST Data reads.
Western Digital noted that the “disaggregation of NVMe to NVME-oF only adds ∼10μs when compared to in-server NVMe drives.”
Its tech brief concludes that “while somewhat dependent on compute, storage capacity and performance requirements, the consumer ultimately has choice over which GPU servers, GPU RNICs, network components and SDD model to incorporate; all within a clearly defined cost model.”