WEKA is previewing its parallel file system working on a storage server cluster powered by Nvidia’s Grace Superchips, and has developed its WEKA AI RAG Reference Platform (WARRP) reference architecture to speed GenAI inferencing development.
The cluster features WEKA’s Data Platform software running on Supermicro storage servers using Grace Superchips. This software has a distributed architecture and kernel-bypass technology.
WEKA says Nvidia’s Grace integrates the level of performance offered by a flagship x86-64 two-socket workstation or server platform into a single module. Grace Superchips have two Grace CPUs, each with 72 x Arm Neoverse V2 cores, connected with a Scalable Coherency Fabric (SCF), that delivers 3.2 TBps of bisection bandwidth, and high-speed LPDDR5X memory that delivers up to 500 GBps of bandwidth. The Grace CPU servers uses BlueField-3 DPUs, also Arm-powered and with RDMA/RoCE acceleration, to offload networking and other tasks from the main Grace CPUs. ConnectX-7 inter-server networking is used within the cluster.
The main benefit is power efficiency, with WEKA saying the Grace CPU superchip delivers the performance of a dual-socket x86 CPU server at half the power. But customers don’t forego speed as its software “combined with Grace CPUs’ LPDDR5X memory architecture, ensures up to 1 TBps of memory bandwidth and seamless data flow, eliminating bottlenecks.”
WEKA’s chief product officer, Nilesh Patel, stated: “AI is transforming how enterprises around the world innovate, create, and operate, but the sharp increase in its adoption has drastically increased data center energy consumption, which is expected to double by 2026, according to the International Atomic Energy Agency.”
Altogether, customers “can achieve faster AI model training, reduced epoch times, and higher inference speeds, making it the ideal solution for scaling AI workloads efficiently.”
Patrick Chiu, Supermicro’s Senior Director for Storage Product Management, said: “The system design features 16 high-performance Gen5 E3.S NVMe SSD bays along with three PCIe Gen 5 networking slots, which support up to two Nvidia ConnectX-7 or BlueField-3 SuperNIC networking adapters and one OCP 3.0 network adapter. The system is ideal for high-performance storage workloads like AI, data analytics, and hyperscale cloud applications.”
WEKA claims that, through data copy reduction and cloud elasticity, its software can shrink data infrastructure footprints by 4-7x and reduce carbon output – avoiding up to 260 tons of CO2e per PB stored annually and lowering energy costs by 10x.
WARRP is a design blueprint for the development of an inferencing infrastructure framework incorporating retrieval-augmented generation (RAG), whereby large language models (LLMs) can gather new data from external sources. WEKA says “using RAG in the inferencing process can help reduce AI model hallucinations and improve output accuracy, reliability and richness, reducing the need for costly retraining cycles.”
WARRP is based on WEKA’s Data Platform software allied to Nvidia’s NIM microservices and NeMo retriever, AI workload and GPU orchestration capabilities from Run:ai, Kubernetes for data orchestration, and Milvus Vector DB for data ingestion. This has similarities to Pure Storage’s GenAI Pod, which was also announced at SC24. WARRP is hardware, software, and cloud-agnostic.
Run:ai CTO Ronen Dar said: “The WARRP reference architecture provides an excellent solution for customers building an inference environment, providing an essential blueprint to help them develop quickly, flexibly and securely using industry-leading components from Nvidia, WEKA and Run:ai to maximize GPU utilization across private, public and hybrid cloud environments.”
The first release of the WARRP reference architecture is now available to download here. The WEKA and Supermicro Grace CPU Superchip storage system will be commercially available in early 2025. SC24 attendees can visit WEKA in Booth #1931 for more details and a demo of the system.