DDN has lifted the covers off AI400X2 Turbo appliance, new AI hardware with more network ports and memory compared to its AI400X2 predecessor.
The AI400X2 Turbo is an update on the existing ExaScaler AI400X2 Lustre parallel filesystem storage array. It has twice the number of Nvidia ConnectX 400G network cards and the controller has 30 percent more memory. The updated system can, we’re told, now deliver 75 GBps write bandwidth, 120 GBps read bandwidth and 3 million IOPS, a 33 percent performance improvement over the AI400X2. This can be via GPUDirect or without it.
James Coomer, DDN senior veep for Products, told B&F: ”NVIDIA GPUDirect Storage allows the IO to bypass the Compute Side CPU when moving data between storage and GPU memory. This results in higher bandwidths, lower latency and lower CPU consumption. It’s just one of a set of optimizations we use to create an optimal datapath between the filesystem and the AI application.”
DDN has a solid presence amongst Nvidia’s CSPs, who supply GPUs-as-a-Service, counting Bitdeer, Lambda, NAVER, OCI, Scaleway and Vultr as customers for its appliances, with OCI also using its ExaScaler software with NVMesh.
A spokesperson at DDN told us that Nvidia’s EOS SuperPOD, the largest DGX H100 SuperPOD, uses its AI400X2 storage appliance.
Coomer explained how DDN optimizes storage for AI workloads: “At DDN, we have 3 primary approaches to make AI applications go faster by reducing IO wait times: 1) optimize exactly for the specific, common AI framework behaviors (the POSIX call, the number of threads, the IO patterns, etc.); 2) cross-stack optimizations (integrating the storage SW up the stack – e.g. with NVIDIA DGX, with network, with containers); and 3) making the storage itself fast (optimizing backend and HW/SW integration).”
“Our experience with AI frameworks is that we need to (a) optimize for the map call since that’s often how AI frameworks request data, (b) optimize the data sent to 1 thread, (c) make writes go fast since checkpoints are a major component, (d) leverage client cacheing to reduce unnecessary data transfers, (e) ensure that threading scales well so that additional application IO threads get linearly scaling data volumes, and, (f) [use] Nvidia GPUDirect Storage.”
The BlueField-3 booster
DDN wants to push the performance limits further: ”We also are planning to talk more about our adoption of NVIDIA Bluefield-3 DPU as well as fantastic performance (we are pushing over 7 GBps to 1 thread!) with NVIDIA Grace CPUs to enable new efficiencies in the datacenter.”
“DDN will be leveraging DPUs both in DDN EXAScaler and DDN Infinia, but also starting to use DPUs in a novel way for much higher gains in performance and overall infrastructure efficiency.”
“DDN Infinia is a storage platform that is implemented entirely in containers, with different low-level storage functions running in different containers and all those containers working together to form a single scalable service. This architecture allows us to pull out some services from the core storage and execute them in DPUs, even DPUs hosted on [the] compute side. We do this to take advantage of the scaling, additional CPU resources and to change the whole end-to-end datapath, reducing latency and infrastructure for the entire system.”
VAST Data has recently adopted BlueField-3 DPUs to run its storage controller software.