WekaIO’s filesystem has transferred a terabyte of data in less than nine seconds, using a Nvidia DGX-2 server in tandem with Nvidia’s GPUDirect. That works out at a blindingly fast 113.1GB/sec.
Update; additional data added to 113.1GB/sec benchmark details section.
GPUDirect Storage (GDS) enables a storage system to send data directly to its GPUs and bypass the host server’s operating system IO stack and infrastructure. GDS enables direct memory access (DMA) between GPU memory and NVMe storage drives. WekaIO had previously achieved 82GB/sec of throughput to a DGX-2.
In this new test Microsoft Research benchmarked update WekaIO 3.8 software, hooked across InfiniBand to a DGX-2 and GPUDirect. The testers initially achieved 97.9GB/sec of throughout, with a single mount point to the WekaFS system. This is the highest throughput of any GPUDirect system tested to date. The result was verified by running an Nvidia GDSIO utility for more than 10 minutes and getting sustained performance over that time.
VAST Data has achieved 92.6GB/sec pumping data to a DGX-2. WekaIO says its software was not running flat out. The test configuration used 10 single-port network interface card (NIC) ports, and these were doubled to 20 by replacing them with dual-port NICs.
The testers added more GDSIO processes to use these ports, use more of the available PCIe bandwidth, and put more load on the GPUs. This second test configuration achieved 113.13GB/s throughput – 38 per cent faster.
The testers reckoned the system ran at 5 million IOPS with the 113.1GB/sec throughput. A source in Microsoft said the 113.1GB/s reflected bandwidth to the host CPUs as well as the GPUs. Any comparison of this WekaIO result to other supplier’s standard DGX-2 results is invalid as it used a non-standard DGX-2.
A source knowledgable about benchmark matters said: “Once you swap out NICs on a DGX-2 it’s not really a DGX-2 but a DGX-2 with an aftermarket fuel injection chip, turbocharger, and muffler, not the same.”
The WekaIO performance is great but… how much does it cost? Without that input it is impossible to assess price performance. We wait patiently for this, but in the meantime DDN and Excelero also support GPUDirect and are yet to release performance numbers. They now have a target to aim for.