Tom “Pugs” Lyon, Sun’s eighth employee and NFS architect, argues that NFS must die and give way to a block-based protocol for large dataset sharing in the cloud.
He gave a talk at the 2024 Netherlands Local Unix User Group conference in Utrecht in February, which can be watched as a YouTube video starting from the 6:24:00 point.
Lyon first established his credentials with impressive stints at Sun, Ipsilon Networks, Netillion, Nuova Systems, and DriveScale:
His presentation’s starting point is that “file systems are great for files but they don’t really give a shit about datasets.” Datasets are collections of files. A dynamically mountable file system in Linux can be considered a dataset, and datasets, as used in AI training, can be made up from billions of files which are consumed (used) by thousands of GPUs, and also updated by a few hundred agents working on a few thousand files at a time.
NFS works and is popular, with its “killer feature that you can rapidly access arbitrary datasets of any size.” Its purpose is to provide a shared mutable data space. It provides access to large datasets, by network-unaware applications, without having to first copy them. But, Lyon argues, shared mutable data is basically a bad idea in concurrent and distributed programming. File sharing is not appropriate in the cloud.
It’s a bad idea because it’s error-prone, and developers are forced to use complex synchronization techniques to ensure data consistency, such as locks and semaphores. If we share immutable data instead, making copies of it, then there is no need for synchronizations, and data consistency can be guaranteed.
In the cloud, he says, sharing layered immutable data is the right way to do it, citing Git, Docker files, Delta Lake, Pachyderm, and LakeFS as successful examples. You can cache or replicate the data and it won’t change. But this can involve too much copying.
Admins should think about using NVMe over Fabrics and block storage providers to get over the copying issue. NVMe-oF provides “crazy fast remote block device access.” Block semantics are well defined and “aggressively cached by the OS.”
The POSIX-like, distributed file system BeyondFS “allows many different block storage providers” such as Amazon EBS, Azure, and GCP equivalents, OpenEBS, Dell EMC, NetApp, and Pure Storage.
There is much more in his presentation, which you can check out here.
Overall, Lyon proposes a new approach to achieve fast, highly scalable, and consistent access to dynamic datasets by using existing file systems, OverlayFS, and NVMe-oF. And he wants collaborators to join him in the effort of getting a new open source project started to achieve it. He can be reached at pugs@lyon-about.com or on Mastodon @aka_pugs@mastodon.social.