VMware wants to play nice with Nvidia DPUs

VMware and Nvidia announced yesterday they are working to make VMware software work better with Nvidia chips. They say the joint initiative, dubbed Project Monterey, will “introduce a new security model that offloads hypervisor, networking, security and storage tasks from the CPU to the DPU”.

The aim is to offload hypervisor, networking, security and storage tasks from a host CPU to Nvidia’s BlueField data processing unit (DPU). This should be useful for AI, machine learning, and high-throughput, data-centric applications, according to the companies.

Nvidia CEO Jensen Huang said in the launch announcement: “Nvidia DPUs will give companies the ability to build secure, programmable, software-defined data centres that can accelerate all enterprise applications at exceptional value.”

Paul Perez, SVP and CTO, Infrastructure Solutions Group at Dell Technologies, also provided a statement: “We believe the enterprise of the future will comprise a disaggregated and composable environment.” 

SmartNIC, DPU and BlueField-2

Dell said VMware Cloud Foundation will be able to maintain compute virtualization on the server CPU while offloading networking and storage I/O functions to the SmartNIC CPU. VMware has taken the first step to achieve this by enabling VMware ESXi to run on SmartNICs.

A SmartNIC or DPU is a programmable co-processor that runs non-application tasks from a server CPU, so enabling the server to run more applications faster. DPUs can compose disaggregated data centre server compute, networking and storage resources. They can also function as intelligent network interface cards that provide security services and network acceleration.

Nvidia’s BlueField-2 is a Mellanox system-on-chip (SoC) that integrates a ConnectX-6 Dx ASIC network adapter with a PCIe Gen 4 x16 lane switch, 2 x 25/50/100 GbitE or 1 x 200GbitE ports, and an array of 8-core, 64-bit Arm processors. This provides an integrated crypto engine for IPsec and TLS cryptography, integrated RDMA and NVMe-oF acceleration, and dedupe and compression.

Use cases

Three use cases are envisaged. First, BlueField-2 can be used with disaggregated storage, which it virtualizes and enables remote, networked storage to be part of a composable infrastructure. Second, BlueField-2 can provision bare metal servers as a CSP operator service to cloud tenants.

VMware said it will re-architect VMware Cloud Foundation to enable disaggregation of the server including support for bare metal servers, a new Cloud Foundation facility. It will enable an application running on one physical server to consume hardware accelerator resources such as FPGAs from other physical servers. 

With ESXi running on the SmartNIC, customers will be able to use a single management framework to manage all their virtualized and bare metal compute infrastructure.

Thirdly, BlueField-2 can be used for micro-segmentation at endpoints to isolate application workloads and their resources from each other. 

There is a security aspect to Project Monterey. Each SmartNIC is capable of running a fully-featured stateful firewall and advanced security suite. Up to thousands of tiny firewalls will be able to be deployed and automatically tuned to protect specific application services that make up the application.

Project Monterey is available as preview code.

Multiple open DPU partnering

VMware is collaborating with Intel, Nvidia and Pensando, and system vendors Dell, HPE and Lenovo to deliver Project Monterey systems. Dell said it could deliver automated systems using SmartNICS from a broad set of vendors.

DPU suppliers include three startups: Fungible, Nebulon, and Pensando. Pensando recently announced it will provide its DPU as a factory-supported option on HPE servers across the VMware Cloud Foundation product line, including vSphere, VSAN, and NSX. Customers will be able to access Pensando’s platform directly within VMware hardware. 

Second VMware Nvidia partnership

Separately, VMware announced at VMworld 2020 yesterday that it is jointly building a deployment platform for VMware-controlled servers to run AI software on attached Nvidia A100 GPUs. The platform combines VMware’s vSphere, Cloud Foundation and Tanzu container orchestration software with Nvidia’s NGC software.  

NGC (Nvidia GPU Cloud) is a website catalogue of GPU-optimised software for deep learning, machine learning, and high performance computing. NGC software is supported on a select set of pre-tested Nvidia A100-powered servers expected from leading system manufacturers.