HPE launches cost-effective storage system for HPC and AI

HPE has built a downsized ClusterStor supercomputer storage array for entry-level and mid-range HPC and AI compute clusters.

Update. C500 does support Nvidia’s GPU Direct protocol. 7 May 2024.

The ClusterStor line, acquired by HPE when it bought Cray in 2019, has a parallel architecture using SSDs and HDDs with Lustre file system software. Its XE E1000 model scales from 60 TB to tens of petabytes across hundreds of racks, each with up to 6.8 PB of capacity. It delivers up to 1.6 TBps and 50 million IOPS/rack. HPE positions ClusterStor as storage for exascale (Frontier, Aurora, El Capitan), pre-exascale (LUMI, Perlmutter, Adastra), and national AI supercomputers (Isambard-AI, Alps, Shaheen III) running Cray EX supercomputers.

Ulrich Plechschmidt, HPE
Ulrich Plechschmidt

Ulrich Plechschmidt, HPE Product Marketing for parallel HPC and AI storage, says the new Cray Storage Systems C500 will “provide [E1000] leadership-class storage technologies at a fraction of the entry-price point and with increased ease-of-use.”

It’s based on the E1000 and intended for use by customers running modeling, simulation, and AI workloads on smaller compute clusters, often built, Plechschmidt says, with Cray XD systems.

The Cray EX system is a liquid-cooled, rack-scale, high-end supercomputer, while the lesser XD can be air or liquid-cooled and comes in 2RU chassis. Both the EX and XD support AMD and Intel x86 CPUs and Nvidia Hopper GPUs. 

HPE Cray XD systems

The mid-range XD665 supports Slingshot 11, Infiniband NDR, and Ethernet networking, and provides direct switchable connections between its high-speed fabric, GPUs, NVMe drives, and CPUs. It supports Nvidia’s GPUDirect protocol.

Bearing in mind its open source-based Lustre support, Plechschmidt declares that a C500 buyer can “feel secure in the fact that your valuable data sits in a file system that is owned by a vibrant community and not by a single company.” Such as, we might think, the one run by Jensen Huang.

The C500 runs the same Lustre software as the E1000, with the same 2RU x 24 drive storage controllers and 5RU x 84 HDD enclosures in a converged and less expensive design.

Entry-level HPE C500 with controller and storage chassis
Entry-level C500 with controller and storage chassis

C500 details:

  • Cheaper ProLiant DL325 server than the E1000’s System Management Unit (SMU) storage controller
  • Combined Metadata Unit (MDU) and Scalable Storage Unit Flash (SSU-F) chassis holding 2RU x 24 NVME SSDs
  • Support for half and fully populated storage enclosures in specific configurations
  • C500 expansion chassis with 2RU x 24 NVMe drives or 5U x 84 HDDs can increase the usable file system capacity to 2.6 PB all-flash or 4 PB hybrid (SSD/HDD) capacity

The entry-level C500 provides between 22 TB and 513 TB usable capacity from 24 NVMe SSDs, delivering up to 80 GBps aggregate read and 60 GBps write performance to the compute nodes. In comparison, IBM’s GPUDirect-supporting ES3500 delivers 126 GBps read and 60 GBps write bandwidth to Nvidia GPUs using the Storage Scale parallel file system. DDN’s Lustre-using A1400X2 Turbo provides 120 GBps read and 75 GBps write bandwidth to these same GPUs.

Plechschmidt says HPE is “rolling out major software improvements and new functionalities that make the storage systems easier to deploy and easier to manage.” Bizarrely, the details are hidden behind an HPE QuickSpecs webpage requiring an authorized partner or HPE employee login. We ordinary folks don’t get to see them.

HPE QuickSpecs webpage
HPE QuickSpecs webpage

HPE fixed the problem, saying: “There was a disconnect internally on when the QuickSpecs doc would go live, but it was not yet live today which is why you got that message.” Download the doc here.