AWS sets up Lustre-based caching filesystem

AWS is running a Lustre-based caching filesystem to provide fast file access to cloud compute needed to process distributed file and object data sets, including ones on-premise.

Update. AWS’ Sébastien Stormacq has updated the pricing section of his blog. 8 October 2022.

Amazon File Cache has a POSIX interface to NFS v3-accessed origin files that can be on-premises or in the public cloud in one or more regions, and also to S3 buckets which store object data.

Sébastien Stormacq, AWS
Sébastien Stormacq

AWS Principal Developer Advocate Sébastien Stormacq writes that Amazon File Cache “transparently loads file content and metadata (such as the file name, size, and permissions) from the origin and presents it to your applications as a traditional file system. File Cache automatically releases the less recently used cached files to ensure the most active files are available in the cache for your applications.” 

It uses a parallel Lustre filesystem behind the scenes and a Lustre client needs to be downloaded to your AWS account to set up the file cache.

There can be up to eight NFS filesystems or eight S3 buckets to a cache – it has to be uniformly NFS or S3 – and they are exposed or presented as a unified set of files and directories. Stormacq says: “The connection between File Cache and your on-premises infrastructure uses your existing network connection, based on AWS Direct Connect and/or Site-to-Site VPN.”

There are two options for uploading data from the origin sources to the file cache. Stormacq says: “Lazy load imports data on demand if it’s not already cached, and preload imports data at user request before you start your workload. Lazy loading is the default.”

The cached data can be accessed for processing by AWS compute services (instances) in containers or virtual machine. According to Stormacq: “Applications benefit from consistent, sub-millisecond latencies, up to hundreds of GB/sec of throughput, and up to millions of operations per second.” The performance depends upon the size of the cache; bigger being better for throughput, and it scales from a starting 1.2TiB (1.32TB) up to the pebibyte level using 2.4TiB increments.”

Stormacq’s blog has demos of him setting up the file cache using two Amazon FSx for OpenZFS file systems. He points out: “File Cache encrypts data at rest and supports encryption of data in transit. Your data is always encrypted at rest using keys managed in AWS Key Management Service (AWS KMS). You can use either service-owned keys or your own keys (customer-managed CMKs).”

The pricing is complex. AWS bills users for the provisioned cache storage capacity and metadata storage capacity and details can be found on a pricing page. Stormacq told us: “We do not charge S3 and Direct Connect and network transfer charges. These are all costs that depends on options chosen. If [the] customer use S3 they will be charged for S3 storage and data transfer. If they use their on-prem NFS server with a DX connection, they will be charged for DX etc.” Enjoy working this out.

File Cache is available in US East (Ohio), US East (N Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (London).