The DAOS parallel file system delivers higher IOPS and bandwidth per server than Lustre and WEKA, according to IO500 benchmark data cited by the DAOS Foundation.
DAOS, the Distributed Asynchronous Object Store parallel file system, originated at Intel in 2015 with a research project for the US Department of Energy. Intel evaluated Lustre, StorageScale (GPFS), PanFS, and BeeGFS and concluded it could develop a more advanced solution. DAOS provided an asynchronous I/O model, a distributed key-value store architecture, a user-space implementation to avoid Linux kernel bottlenecks, and was designed to have low latency, high bandwidth, and a high IOPS rating. DAOS was also architected to use Intel’s Optane storage-class memory for faster metadata handling. When Optane was canceled in summer 2022, much of the impetus behind it disappeared.
The software was released under an Apache 2.0 open source license in 2019 and used in the much-delayed Aurora supercomputer, built by Intel and Cray, at the Argonne National Laboratory. Intel and Cray won the deal in 2015 and it was meant to be up and running in 2018 but was actually installed in June 2023, after Optane had been canned, and went live in January 2025.
The DAOS Foundation was formed in November 2023 by Argonne National Lab, Enakta Labs, Google, HPE, and Intel – the main players invested in the software. The UK’s Enakta Labs offers a DAOS-based Enakta Data Platform HPC storage system. HPE bought Cray in May 2019. Google used DAOS in its 2024 Parallelstore HPC file system for AI, ML, and scientific computing, having started a DAOS collaboration with Intel in 2020. VDURA joined in 2024, seeing DAOS as having potential advantages over its long-lived PanFS parallel file system in the AI area.
The Leibniz Supercomputing Centre (LRZ) is also a DAOS user, deploying it in its SuperMUC-NG Phase 2 system in the Munich area in Germany, but LRZ is not a foundation member.
The DAOS Foundation wants to keep DAOS going as a vendor-independent open source project. The Foundation has a technical steering committee, chaired by Johann Lombardi, an HPE Senior Distinguished Technologist. Between 2013 and 2020, he was an HPC Software Architect and then Principal Engineer in Intel’s Extreme Storage Architecture and Development Unit. He is a long-term DAOS developer and advocate.

Lombardi presented a DAOS session to an IT Press Tour event outlining its architecture and how it recovered from Optane’s demise. There is a library of DAOS routines layered above two types of DAOS instance – a DAOS control plane and DAOS engine, which talks to the storage drives via an RDMA link.

A protocol and middleware layer sits above libdaos and presents file, block, and object interfaces to client applications. This layer includes HPC I/O middleware, and AI and big data frameworks. All these execute as part of client compute instances.
DAOS is inherently multi-tenant and has a dataset concept in which tenants have one or more datasets, their basic unit of storage. Datasets are defined by their capacity, throughput, IOPS rating, and other qualities, such as their type. For example, POSIX-compliance, key-value store or Python. They are spread across drives with I/O handled by the data engines in each node.

Datasets have access control lists and are a snapshot entity. POSIX datasets can include trillions of files/directories. A tenant dataset is viewed by DAOS as a container with a certain type, which is stored as a set of objects in its multi-level key-value store.

DAOS metadata and data operations, such as updates and inserts, were previously written to Optane storage-class memory in the particular DAOS node and thus persisted for recovery if a DAOS operation failed mid-transaction. Post-Optane, a Write Ahead Log (WAL) is held in DRAM and persisted to NVMe SSD storage. Data and metadata are written in parallel and can be written to a dedicated SSD or ones shared with normal DAOS storage duties. Checkpoint data goes to the metadata SSD as well. Lombardi said the pre and post-Optane performance was comparable.

The Aurora DAOS system in Optane mode won the top IO500 Production Overall Score slot in 2023. It was an HPE Cray EX Supercomputer with more than 9,000 x86 Sapphire Rapids nodes using >230 PB of DAOS storage across 1,024 nodes with dual Slingshot NICs, delivering some 25 TBps of bandwidth. Each node has two Sapphire Rapids CPUs and six Xe-based Arc GPUs. There is between 220 and 249 PB capacity depending on the redundancy level chosen.
DAOS won the top two slots in the SC 2024 production list; Argonne first with 300 client nodes and a score of 32,165.09, and LRZ second using 90-client nodes and scoring 2,508.85. An HPE Lustre system (2,080 nodes) was third, scoring 797.04, with a WEKA system (261 nodes) fourth rated at 665.49. A 2,000-node DDN EXAScaler/Lustre system was fifth at 2,000 nodes and scoring 648.96.
Lombardi presented a per-server chart to emphasize DAOS’s superiority:

DAOS is reported to be up to three times faster per server than Lustre or WEKA, based on IO500 benchmarks. What now? The DAOS roadmap looks like this:

DAOS v2.6 in July 2024 was the last Intel release. Version 2.8 will be the first community release and is slated for delivery later this year. A v3.0 release is being worked on for 2026 and there is subsequent version under development as well.
Comment
The DAOS situation is that it has a highly credible flagship win with its Aurora supercomputer user. DAOS is open source HPC/supercomputing software with AI training and inference aspirations needing developer adoption to grow. Existing HPC, supercomputing, and AI training/inference users have adopted other high-performance software, such as Storage Scale, Lustre, VAST Data, WEKA, BeeGFS, or ThinkParQ, or they are using hardware/software combinations like HPE Cray and DDN. They will likely not adopt DAOS, even though it’s open source, because of the conversion cost, unless there is some clear advantage, such as improved performance, lower cost, or freedom from vendor lock-in.
In the AI training and inference area, dominated by Nvidia GPU servers, most suppliers of storage hardware and software support the GPUDirect protocol. DAOS does not. DAOS does use RDMA, however, and a DAOS-backed storage system with RDMA-capable NICs could, in theory, support GPUDirect, enabling direct data paths.
In this area, object storage use is being promoted by Cloudian, DDN (Infinia), MinIO, Scality, and VAST Data, with the encouraging backdrop of Nvidia’s GPUDirect for object storage protocol.
Without GPUDirect support for either file or objects, DAOS faces an obstacle to its ambitions to get into any Nvidia GPU-using AI storage environment, despite having demonstrated IO500 performance advantages over Lustre and WEKA on a per-server basis.
We note that Google is not all-in with DAOS as it has a parallel HPC/AI file system track to its DAOS-based Parallelstore offering. This is a managed Google Cloud Managed Lustre service, powered by DDN. We also note that Intel has its own problems and DAOS promotion will be low on its list of priorities.
Intel’s DAOS developers did a marvelous job recovering from the Optane fiasco and we have the impressive 2023 Aurora IO500 rating testifying to that. There are no publicly known DAOS adopters beyond the Foundation members and LRZ. DAOS faces an uphill climb to gain wider adoption, and developer adoption is the vital ingredient it needs to grow its market presence.
Developers can find out more at the links below:
