Excelero boosts BeeGFS for faster AI work by GPUs

ThinkParQ has integrated Excelero’s NVMesh software intoBeeGFS parallel clustered file system.

NVMesh provides NVMe-over-Fabrics access to all-flash storage, configured as an all-flash array or as hyperconverged storage. It provides the storage for the open source BeeGFS, which is maintained by ThinkParQ.

BeeGFS clients have direct parallel access to storage servers with file access informed by separate metadata servers. Files are separated into chunks and striped over the storage servers. Metadata can also be spread across several metadata servers for scalability.

AI applications that use GPU compute clusters experience an IO bottleneck because data reaches the GPUs too slowly from external storage. NVMesh software can speed up external storage access by GPU cluster, according to Excelero and ThinkParQ.

The two companies tested their setup on a 2U 4-server chassis with 24 NVMe SSDs, running NVMesh and connected via a 100Gbit RDMA network to 8 BeeGFS client compute nodes. This system delivered 75GB/sec sequential bandwidth and 1.25 million random write IOPS. The IOPS number dropped to 251,000 when NVMesh was not used.

Excelero and ThinkParQ will show BeeGFS running with NVMesh at the ISC High Performance Event 2019, June 16-20 in Frankfurt, in ThinkParQ’s booth J-640 and Excelero’s booth E-1039. Read a blog about Excelero and AI data access here