Amazon Web Services pledges cheaper and 12x faster AI speeds with Lustre storage

Scalable file storage service Amazon FSx for Lustre is promising GPU throughputs 12 times faster, to aid AI learning workloads and “reduce costs”.

To achieve this, says Amazon Web Services, FSx for Lustre now supports Amazon’s Elastic Fabric Adapter (EFA) and the Nvidia’s GPUDirect Storage (GDS) system. With this combination, FSx for Lustre can deliver “up to 12x higher throughput” per client instance (1200 Gbps) compared to previous FSx for Lustre systems, said AWS.

“You can now complete machine learning training jobs faster and reduce workload costs,” the megacorp said in a statement.

Elastic Fabric Adapter is a network interface designed for Amazon EC2 instances, facilitating high-performance networking capabilities. EFA is designed to reduce latency and increase the bandwidth of inter-node communications. This makes it ideal for applications that demand fast data exchanges, such as simulations and modeling.

EFA improves workload performance by using the AWS Scalable Reliable Datagram (SRD) protocol, to increase network throughput utilization and by bypassing the operating system during data transfer. For applications powered by high-performance computing instances, such as Trn1 and Hpc7a, organisations can use EFA to achieve higher throughput per client instance.

Nvidia GPUDirect Storage builds on EFA by creating a direct data path between the storage and GPU memory. By allowing data to bypass the CPU, GDS minimizes data transfer delays and it reduces the need for redundant memory copies, optimizing GPU performance as a result.

With the combination of EFA and GDS support, applications using P5 GPU instances and Nvidia Compute Unified Device Architecture (CUDA), can achieve the aforementioned “up to 12x higher throughput” per client instance.



EFA and GDS support is available at no additional cost on new FSx for Lustre Persistent-2 file systems, in all commercial AWS Regions where Persistent-2 file systems are available. For more information, organizations can view the Amazon FSx for Lustre documentation here.