Los Alamos builds flash box for data analysis

Los Alamos box
Los Alamos box

Los Alamos National Lab engineers have built an accelerated box of flash drives (ABOF) using an Eideticom FPGA accelerator and Nvidia BlueField-2 SmartNICs to speed data access by 10-30x.

The intent is to develop an open storage system acceleration architecture for scientific data analysis, based on this accelerator-enabled, programmable, and network-attached ABOF offloading host server CPUs which are running scientific data analysis apps. The ABOF design means there are no major storage system software modifications and no application changes needed.

Dominic Manno, a researcher with the lab’s High Performance Computing Division, said: “Performing the complex analysis to enable scientific discovery requires huge advances in the performance and efficiency of scientific data storage systems. The ABOF programmable appliance enables high-performance storage solutions to more easily leverage the rapid performance improvements of networks and storage devices, ultimately making more scientific discovery possible. Placing computation near storage minimizes data movement and improves the efficiency  of both simulation and data-analysis pipelines.”

Blocks & Files diagram of ABOF components and connectivity. We don’t know how many BlueField-2s are used but 4 would be needed to handle 24 drives, according to an Nvidia reference document

Los Alamos co-created the ABOF with partners. SK Hynix provided 24 NVMe SSDs in U.2 format, Mellanox the BlueField-2 SmartNIC/DPU devices, Eideticom its NoLoad FPGA-based accelerator, and Aeon Computing designed and integrated the enclosure and components. Los Alamos says Eideticom created the NoLoad computational storage stack used to accelerate data-intensive operations and minimize data movement to and from the ABOF. 

Eideticom’s device can compress and decompress stored data with a single U.2 PCIe Gen 3×4 NoLoad device operating at more than 3GB/sec. This latest LANL-Eideticom work builds upon earlier efforts looking at Eideticom-accelerated compression from 2020, as a white paper describes. The ABOF handles compression plus erasure coding and checksums with its accelerators.

The system uses the Linux Zettabyte File System (ZFS) with performance-critical functions offloaded to the accelerators in the ABOF. It has two relevant aspects: a ZFS Interface for Accelerators, which is available on GitHub, and a Linux DPU Services Module, also on GitHub. This is a kernel module which enables DPU access directly within the kernel and irrespective of where they are located on the data path.

The ABOF’s drives are accessed across PCIe links and the Eideticom accelerator also has PCIe connectivity. The BlueField-2 DPUs link to accessing host systems via 200Gbit/s InfiniBand.

LANL has successfully demonstrated the ABOF working and the next step is for its researchers to pursue integrating a set of common analysis functions in the system.

Comment

This system is the first we have seen to combine separate FPGA-based acceleration and BlueField-2 DPUs. Nvidia and Eideticom competitor Fungible has single-device DPUs front-ending storage boxes and doing all the host server offload functions.

An Nvidia BlueField-2 reference document says the programmable DPU can carry out compression/decompression, erasure coding, and deduplication. The logic of using either Fungible or Nvidia DPUs is that you don’t need an FPGA accelerator as well.