A post by Arindam Banerjee, chief Data ONTAP architect at NetApp, says: “Stream data processing is an integral part of big data pipelines and artificial intelligence and machine learning workflows. … ONTAP enables offloading of compute-intensive storage operations from the compute server, freeing up the server resources needed to optimize streaming broker activity.”
He’s talking about the NFS file side of ONTAP and says that 400 GbitE and 800 GbitE debunk the notion that networked delivery to external filers is a bottleneck for streaming data data ingest. Sending the streaming data to such filers using, for example, NFS over RDMA, is as fast as a server sending it to its direct-attached SSDs.
But there is a real problem with NFS when a connected Kafka cluster is resized or re-partitioned for load balancing or maintenance purposes. Kafka runs on a cluster of servers (brokers), taking in log messages from producers (originating systems) and sending them on to destination systems (consumers). Kafka brokers set up data partitions as file directories which contain data files.
Note that brokers need not be separate physical servers. They could be containers running on pods in virtualized servers running on physical servers. But the brokers are still independent machines running the Kafka broker process.
The Kafka cluster can crash due to the “silly rename” issue. What happens is that Kafka sets up a new copy of the partition on the destination broker. Next it deletes the now redundant old copy on the old broker.
This deletion is where the crash is happening, and it’s because of the way the NFS file system works. When an NFS system is “mounted” by accessing client systems once a file is deleted it cannot be used by an application. However NFS clients emulate what typical Unix file systems do. They allow a file to be deleted while it has data being written to it, with the OS destroying the link to the file in the filesystem metadata – unlinking – and marking its allocated capacity as being available.
The application sending data to the file will keep its name (handle) open and drive space can only be reclaimed when the handle is set closed. NFS gets over this incompatibility by the NFS client renaming an unlinked file to new name such as “.nfsXXXXX” – thereby concealing the file while it is still in use. This is a so-called silly rename. NFS, on the last close of the open file, removes (deletes) the file.
As a consequence file directories containing silly renamed files cannot be deleted. Attempts to do so return a “device not ready” error. When Kafka unlinks a partition it does not close handles and. So tries to close a partition that contains silly rename files, gets the error and the broker gives up and crashes.
Bannerjee says: “We leveraged the ‘delete on last close’ feature in the NFS4.x spec to fix the long-standing silly rename issue.”
He blogs: “With the ‘delete on last close’ feature in NFSv4.x, the server is allowed to manage the unlink workflow and is able to orchestrate the operations in the right order without affecting the application. The protocol requires the NFS client and the server to agree on the capability to handle the unlink workflow, and therefore changes are required on both the NFS client and server side. NetApp engineers implemented the changes to both the NFS server side (ONTAP) and the Linux NFS client side (open source) and contributed the changes upstream.”
This means that: “With ONTAP providing high bandwidth NFS storage, Kafka applications can now go faster. Because storage and rebuild operations are now offloaded to the storage system, Kafka brokers can be deployed with less compute, enabling Kafka deployments to be cheaper.”
The changes will be generally available in RHEL 8.7 and RHEL 9.1.