Dell amplifies AI workload support with Intel Gaudi3

Dell has added Intel Gaudi3 GPU support to its XE9680 server and ported APEX File Storage to Azure in support of AI workloads.

The XE9680 server was announced in January 2023 and has gen 4 Xeon processors (up to 56 cores), a PCIe 5.0 bus, and support for up to eight Nvidia GPUs. By October, it had become the fastest ramping server in Dell’s history. As of March this year, it supports Nvidia’s H200 GPU, plus the air-cooled B100 and the liquid-cooled HGX B200. Intel’s Gaudi3 accelerator (GPU) has two linked compute dies, each with eight matrix math engines, 64 tensor cores, 96 MB of SRAM cache, 16x lanes of PCIe 5.0, 24 200GbE links, 128 GB of HBM2e memory, and 3.7 TBps of bandwidth.

Now the XE9680 is adding Gaudi3 AI accelerator support. The Gaudi3 XE9680 version has up to 32 DDR5 memory DIMM slots, 16 EDSFF3 flash drives, eight PCIe 5.0 slots, and six OSFP  800GbE ports. It’s an on-premises AI processing beast. 

Deania Davidson, Dell
Deania Davidson

A Dell blog written by Deania Davidson, Director AI Compute Product Planning & Management, says: “The Gaudi3’s open ecosystem is optimized through partnerships and supported by a robust framework of model libraries. Its development tools simplify the transition for existing codebases, reducing migration to a mere handful of code lines.” 

The OSFP links allow for direct connections to an external accelerator fabric without the need for external NICs to be placed in system. Davidson says: “Dell has partnered with Intel to allow select customers to begin testing Intel’s accelerators via their Intel Developer Cloud solution.” Learn more about that here.

APEX File Storage for Azure

Dell launched its APEX File Storage for AWS, based on PowerScale scale-out OneFS software, in May last year. Now it has added APEX File Storage for Microsoft Azure, complementing its existing APEX Block Storage for Azure. It claims the APEX File Storage for Azure is “a game-changing innovation that bridges the gap between cloud storage and AI-driven insights” in a blog by Principal Product Manager Kshitij Tambe.

Kshitij Tambe, Dell
Kshitij Tambe

The Azure APEX File Storage provides high-performance and scalable multi-cloud file storage for AI use cases. Tambe says customers can “move data from on-premises to the cloud using  advanced native replication without having to refactor your storage architecture. And once in the cloud, you can use all enterprise-grade PowerScale OneFS features. With scale-out architecture to support up to 18 nodes and 5.6 PiB in a single namespace, APEX File Storage for Azure offers scalability and flexibility without sacrificing ease of management.”

Then he compares it to Azure NetApp Files, claiming:

  • 6x greater read throughput cluster performance 
  • Up to 11x larger namespace
  • Up to 23x more snapshots per volume
  • 2x higher cluster resiliency
  • Easier and more robust cluster expansion

He says it has the highest performance at scale for AI, based on maximum throughput performance and namespace capacity. We’ve asked NetApp what it thinks about these claims and a spokesperson said: “Azure NetApp Files is a high-performance enterprise file storage service that is a managed native service from Microsoft. While it is based on NetApp’s leading storage OS, the team at Microsoft will be better able to answer your questions about the specifics of performance.”