Analysis. Cisco has announced an Nvidia-based GPU server for AI workloads plus plug-and-play AI PODs that have “optional” storage, though Cisco is not included in Nvidia’s Enterprise Reference Architecture list of partners.
Switchzilla introduced an AI server family purpose-built for GPU-intensive AI workloads with Nvidia accelerated computing, and AI PODs to simplify and de-risk AI infrastructure investment. The server part of this is the UCS C885A M8 rack server with Nvidia H100 and H200 Tensor Core GPUs and BlueField-3 DPUs to accelerate GPU access to data. The AI PODs for Inferencing are full-stack converged infrastructure designs including servers, networking, and Nvidia’s AI for Enterprise software portfolio (NVAIE), but they don’t actually specify the storage for that data.
A Cisco webpage says that the AI PODs “are CVD-based solutions for Edge Inference, RAG, and Large-Scale Inferencing,” meaning not AI training. CVD stands for Cisco Validated Designs, “comprehensive, rigorously tested guidelines that help customers deploy and manage IT infrastructure effectively.”
The webpage has an AI POD for Inferencing diagram showing the components, which include an Accelerated Compute (server) element:
We’re told that Cisco AI Infrastructure PODs for Inferencing have independent scalability at each layer of infrastructure and are perfect for DC or Edge AI deployments. There are four configurations that vary the amount of CPU and GPUs in the POD. Regardless of the configuration, they all contain:
- Cisco UCS X-Series Modular System
- Cisco UCS X9508 Chassis
- Cisco UCS-X-Series M7 Compute Nodes
Note the M7 Compute Nodes, which means Cisco’s seventh UCS generation. The new M8 generation GPU server is not included and thus is not part of this AI POD. Neither are Nvidia’s BlueField-3 SuperNICs/DPUs included.
Because of this, we think that Cisco’s AI POD for Inferencing does not meet Nvidia’s Enterprise Reference Architecture (RA) needs, which is why Cisco was not listed as a partner by Nvidia. The Enterprise RA announcement said: “Solutions based upon Nvidia Enterprise RAs are available from Nvidia’s global partners, including Dell Technologies, Hewlett Packard Enterprise, Lenovo and Supermicro.”
We have asked both Cisco and Nvidia about Cisco being an Enterprise RA partner and the AI PODs being Enterprise RA-validated systems. A Cisco spokesperson answered our questions.
Blocks & Files: Are Cisco AI PODs part of the NVIDIA RA program, and if not, why?
Cisco: Nvidia has previously introduced reference architectures for cloud providers and hyperscalers, and their recent announcement extends those RAs to enterprise deployments. Their RA program isn’t dissimilar to Cisco’s Validated Designs. A key component of the Nvidia’s RAs is their SpectrumX Ethernet networking, which is not offered as part of Cisco’s AI PODs. Additionally the AI PODs will, over time, offer choice in GPU provider. Regardless of PODS vs. RAs, Cisco and Nvidia are in agreement that our customers need us to help them along on this journey by simplifying our offers and providing tried and tested solutions that help them move faster.
Blocks & Files: Do the AI PODs include the latest UCS C885A M8 servers?
Cisco: The UCS C885A M8 is not part of an AI POD today, but it is planned for future PODs. The UCS C885A M8 was just announced at Cisco Partner Summit and will start shipping in December. At that time, Cisco will develop Validated Designs, which will be used as the basis for creating AI PODs for training and large-scale inferencing. All to say – more to come.
****
There is no storage component identified in the AI POD diagram above, despite AI PODs being described as “pre-sized and configured bundles of infrastructure [which] eliminate the guesswork from deploying AI inference solutions.”
Instead, either Pure Storage or NetApp are diagrammed as providing a converged infrastructure (CI) component. The webpage says: “Optional storage is also available from NetApp (FlexPod) and Pure Storage (FlashStack).”
We find this odd on two counts. We would think AI inferencing is obviously critically dependent on potentially large amounts of data that must be stored. Yet the storage part of an AI POD is “optional” and hardly helps “eliminate the guesswork from deploying AI inference solutions.”
Blocks & Files: Why is storage optional in the AI PODs?
Cisco: The AI PODs that were introduced at Partner Summit are for inferencing and RAG use cases. Inferencing doesn’t necessarily require a large amount of storage. In order to align with customers’ needs, we wanted to make the storage component optional for this use case. Customers using the AI PODs for RAG can add NetApp or Pure as part of Converged Infrastructure stack (FlexPod, FlashStack), which is delivered through a meet-in-the-channel model. For future PODs, in which the use case requires greater storage needs, we will work with our storage partners to fully integrate.
****
Also, a FlexPod is an entire CI system in its own right, including Cisco servers (UCS), Cisco networking (Nexus and/or MDS), and NetApp storage, with more than 170 specific configurations. The storage can be ONTAP all-flash or hybrid arrays or StorageGRID object systems.
Cisco’s AI POD design, purporting to be an entire CI stack for AI inferencing, needs to include specific NetApp storage and not a NetApp entity (FlexPod) that is itself a CI stack.
Pure’s FlashStack is, like FlexPod, a full CI stack with “more than 25 pre-validated solutions to rapidly deploy and support any application or workload.” It has “integrated storage, compute, and network layers.”
Again, Cisco’s AI POD design needs to specify which Pure Storage products, FlashArray or FlashBlade, and allowable configurations, are valid components for the AI POD, not just refer to Pure’s full CI FlashStack.
It could make more sense if there were specific FlexPod for AI Inferencing or FlashStack for AI Inferencing designs. At least then customers could get converged AI infrastructure from one supplier or its partners, instead of having to go to Cisco and then, separately, to NetApp or Pure. The AI POD for Inferencing CI concept could be confusing when it refers to FlexPod and FlashStack CI systems.