VAST Data has announced its next-generation Ceres storage drive enclosure using long ruler format drives and Nvidia BlueField-1 DPUs to deliver 50 percent more bandwidth. It is developing a turnkey AI system with Nvidia’s DGX SuperPod hooked up to Ceres.
Nvidia BlueField DPUs (Data Processing Units or SmartNICs) are programmable co-processors that run non-application tasks from a host application server CPU or replace storage controllers and NICs in an array, as they do in the VAST Ceres instance. E1.L long format ruler drives are an example of EDSFF (Enterprise and Datacenter SSD Form Factor), which is set to replace the M.2 and U.2 formats with higher-capacity drives supporting better thermal management in denser enclosures.
VAST CMO Jeff Denworth said: “While explosive data growth continues to overwhelm organizations who are increasingly challenged to find value in vast reserves of data, Ceres enables customers to realize a future of at-scale AI and analytics on all of their data as they build to SuperPOD scale and beyond.”
The aim is to enable faster processing of huge datasets while saving power as the DPUs use less than existing IO modules. The long ruler format opens the door to higher capacity enclosures in the future, possibly double or even more than what is initially available with Ceres.
VAST’s hardware architecture relies on front-end stateless IO servers (Cnodes) communicating with requesting application servers using NFS or S3 protocols. These IO servers link to NMVe drives in back-end data enclosures across an NVMe-oF link. The current VAST data enclosure has two active-active IO modules, each with two Intel CPUs, in a 2RU chassis holding 44 x 15.36TB or 30.72TB U.2 format Intel SSDs, totaling 675TB or 1,350TB of raw capacity. These drives use QLC (4 bits/cell) NAND. The chassis also contains 12 x 1.5TB Optane or other storage-class memory (SCM) drives for caching and consolidating write data before striping it across the SSDs. The data enclosure delivers up to 40GB/sec of bandwidth across 4 x 100GbitE or 4 x EDR (100Gbit) IB connections.
Ceres is half the size of the existing data chassis, being 1RU in thickness, and contains up to 22 x E1.L 9.5mm drives in the same 15.36 or 30.72TB capacities, again using QLC NAND, totaling 675.84TB of maximum raw capacity. The starting capacity is 338TB. We understand these are Solidigm D5-P5316 drives using 144-layer 3D NAND with a PCIe Gen 4.0 interface. VAST says its data reduction algorithms can provide an effective 2PB of capacity per Ceres box.
There are four ARM-powered BlueField-1 BF1600 DPUs, which replace the two existing Intel-driven IO modules and provide more than 60GB/sec of network capacity. VAST tells us: “There’s some VAST code for authentication and connection setup running on the Bluefield ARM cores but it’s limited.”
Also: “Under normal conditions each Bluefield connects 1/4th of the Ceres SSDs (both SCM and QLC) to the fabric in a 4 way active-active config.” Each BF1600 has PCIe gen 4 x 16 connectivity.
VAST will eventually move to using BlueField-3 DPUs.
The number of U.2 format SCM drives is reduced from 12 in the 2U chassis to 8 in Ceres, with a total 6.4TB of SCM capacity. These are 800GB Kioxia FL6 drives.
VAST says that ruler-based flash drives will over time pack more flash capacity compared to traditional U.2 format NVMe drives because they have a much larger surface area. Solidigm is also developing penta-level cell (PLC) drives with 5 bits/cell capacity, providing for a 25 percent capacity boost over QLC drives.
Ceres data enclosures can scale out to hundreds of petabytes and be mixed and matched with VAST’s existing data enclosures in its Universal Storage clusters. The Ceres chassis are fully front and rear serviceable, and VAST’s system supports full enclosure failover. Ceres will be built by VAST design partners such as AIC and Mercury Computer.
Nvidia and AI
VAST and Nvidia are working to make Ceres a data platform foundation for a turnkey AI datacenter with Nvidia’s DGX SuperPOD. Ceres is in the process of being certified for the GPU server. The DGX system itself has BlueField client-side DPUs. VAST and Nvidia are collaborating on new storage services to enable zero-trust security and offload functionality by using these client-side DPUs with Ceres.
VAST Data Universal Storage certification for DGX SuperPOD is slated for availability by mid-2022. The company says that, to date, it has received software orders to support over 170PBs of data capacity to be deployed on Ceres platforms.