Nvidia’s Enterprise Reference Architectures (ERAs) are blueprints for building AI workload-focused datacenters to, as Nvidia says, manufacture intelligence. ERAs help Nvidia systems partners and joint customers build their own AI factories. An ERA provides full stack hardware and software recommendations, with the hardware side overseeing servers, clusters, and networking.
An Nvidia blog by Bob Pette, VP and GM for enterprise platforms, claims that each ERA covers:
- Nvidia-certified server configuration, featuring its GPUs, CPUs and networking technologies to deliver performance at scale.
- AI-optimized networking with Nvidia’s Spectrum-X AI Ethernet network and BlueField-3 DPUs to address varying workload and scale requirements.
- NvidiaA AI Enterprise software base for production AI, which includes AI application NeMo and NIM microservices, and Base Command Manager Essentials for infrastructure provisioning, workload management and resource monitoring.
ERA systems are available from Nvidia partners including Cisco, Dell, HPE, Lenovo and Supermicro, with 23 certifed datacenter partners, and 577 systems listed in an Nvidia catalog.
The certified servers come in compute, general purpose and high-density VDI categories, with the compute ones appropriate for ERA as they cover AI training and inferencing, data analytics, and HPC.
There is no focus on storage at this ERA level, although it’s needed to keep Nvidia’s GPUs busy, as Nvidia does not supply storage. Instead storage hardware and software is left to Nvidia’s certified server partners, and they use storage that has Nvidia GPU integration, typically including GPUDirect support with its direct GPU server to storage drive RDMA data transfers.
For example, HPE Private Cloud AI’s AI infrastructure stack includes GreenLake for File Storage, based on Alletra MP all-flash storage compute nodes running VAST Data software. This software has Nvidia SuperPOD certification. HPE’s Private Cloud AI itself has has Nvidia BasePOD certification and OVX storage validation.
The new Enterprise Reference Architectures are different from the existing Nvidia AI Enterprise: Reference Architectures (note the colon, which specifies deployments of Nvidia’s AI Enterprise software suite with VMware vSphere).