Seagate object storage used in exascale computing projects

Seagate’s CORTX object storage was used for high-performance research projects in the European Union’s SAGE Exascale computing initiative.

SAGE, started in 2015,  is one those weird made-up acronyms, and apparently stands for Percipient StorAGe for Exascale Data Centric Computing. PSEDCC doesn’t have the same memorable ring to it. Anyway, the SAGE system, which aimed to merge Big Data analytics and HPC, had a storage-centric approach in that it was meant for storing and processing large data volumes at exascale.

According to an ACM document abstract, “The SAGE storage system consists of multiple types of storage device technologies in a multi-tier I/O hierarchy, including flash, disk, and non-volatile memory technologies. The main SAGE software component is the Seagate Mero Object Storage that is accessible via the Clovis API and higher level interfaces.” [Mero was a prior name for what became CORTX.]

SAGE paper diagram.

A first prototype of the SAGE system was implemented and installed at the Jülich Supercomputing Center in Germany. A SAGE 2 project was set up in 2018 to validate a next generation storage system building on SAGE for extreme scale computing scientific workflows and AI/deep learning. It “provides a highly performant and resilient, QoS capable multi tiered storage system, with data layouts across the tiers managed by the Mero Object Store, which is capable of handling in-transit/in-situ processing of data within the storage system, accessible through the Clovis API.”

SAGE and SAGE 2 have given rise to research papers, such as a PhD doctoral thesis by Wei Der Chien, a student at the KTH Royal Institute of Technology in Stockholm, entitled “Large-scale I/O Models for Traditional and Emerging HPC Workloads on Next-Generation HPC Storage Systems.“ This looked at using an object store for HPC applications. Chien developed a programming interface that can be used to leverage Seagate’s Motr object-store.

Motr

Motr, according to Github documentation, is a distributed object and key-value storage system that sits at the heart of Seagate’s CORTX object store and uses high-capacity drives. Its design was influenced by the Lustre distributed and parallel filesystem, NFS v4.0, and database technology. Motr interacts directly with block devices and is not layered on top of a local file system. It provides a filesystem interface but is not, itself, a filesystem.

Motr controls a cluster of networked storage nodes which can be disk or solid state-based, meaning flash, faster PCIe-attached flash, battery-backed memory and phase-change memory. Each Motr node caches a part of system state. This cache consists of meta-data (information about directories, files, their attributes) and data (file contents, usually in form of pages). The cache can be stored in volatile memory or on persistent store.

IO activities result in system state updates which can occur on multiple nodes. State updates are gradually moved towards more persistent stores. For example, an update to an in-memory page cache might be propagated to a cache stored on a flash drive and later to a cache stored on a disk drive.

A Seagate spokesperson told us the SAGE platform at Jülich Supercomputing ran CORTX Motr, with 22 nodes: 8 clients and 14 storage nodes. The storage nodes had multiple tiers: NVRAM, SSD, and HDD – served by different Motr pools. They form a single Motr cluster with these multiple performance tiers.

Users specify which pool to use and there is a user-directed Hierarchical Storage Management (HSM) tool to move data between pools. This connects to a libmotr interface, as do the HPC applications. We’re told that the libmotr interface is more HPC and AI-friendly than Amazon’s S3. Libmotr has high performance options, like scatter-gather, and direct connections via MPI-IO.

Some in the HPC community prefer to avoid high level interfaces like S3, opting instead for low-level interfaces, like libmotr, and APIs which provide greater control.

NoaSci

This month, Wei and others authored a follow-on paper called “NoaSci: A Numerical Object Array Library for I/O of Scientific Applications on Object Storage.” We have not seen the whole document but its abstract states: “While the POSIX-based I/O model dominates modern HPC storage infrastructure, emerging object storage technology can potentially improve I/O performance by eliminating these bottlenecks.” The researchers devised NoaSci, a Numerical Object Array library for scientific applications, which supports different data formats (e.g. HDF5, binary), and focuses on supporting node-local burst buffers and object stores.

They then showed how scientific applications can perform parallel I/O on Seagate’s Motr object store through NoaSci.

Seagate technical staff operating in Senior Vice President Ken Claffey’s system business team were involved in this research in the SAGE and SAGE 2 projects, which in turn informed Wei’s research.

The Motr low-level object API was co-designed by Seagate with its EU HPC partners including Professor Stefano Markidis at KTH. Wei is a student of Markidis. His Google scholar page shows that his 6th most cited publication is the original SAGE work on which Sai Narasimhamurthy, a Seagate UK-based engineering director, was a co-author.

Another cited paper, “MPI windows on storage for HPC applications,” was co-authored by Markidis, Narasimhamurthy, and others

Seagate told us: ”We were honored that CORTX Motr was the choosen object storage system for these projects and greatly benefited from these relationships which drove the CORTX Motr interface to be what it is today and remains the preferred interface for many within this community.”

It has added an S3 interface for enterprise and cloud users who prefer a higher-level interface and are not, typically, willing to rewrite their applications to achieve very high performance. 

The SAGE and SAGE2 projects have ended but Seagate continues its collaboration with KTH and others in the IO-SEA and https://www.esiwace.eu/ projects. 

Comment

MinIO has made most of the perceived running in positioning object storage as a primary data store for applications needing fast access to large amounts of data. Now we find that, nestled in European academic HPC research, Seagate’s CORTX object storage software has a low-level interface to its core Motr system, enabling HPC users to enjoy fast access to object data as well.

But, to enjoy the high speed, CORTX has to be used with the libmotr API interface, meaning application software changes are required. It would be fascinating to see if CORTX, via libmotr, is as fast or even faster than MinIO, and whether CORTX could have a future in the commercial sphere for fast access object storage.