SSD zoning is great

IBM Research and Radian Memory Systems have found SSD zoning can deliver three times more throughput, 65 per cent more transactions per second and greatly extend an SSD’s working life.

SSD zoning involves setting up areas or zones on an SSD and using them for specific types of IO workloads, such as read-intensive, write-intensive and mixed read/write IO. The aim is to better manage an SSD’s array of cells in order to optimise throughput and endurance. A host to which the SSD is attached manages the operation of the zones and relieves the SSD’s Flash Translation Layer (FTL) software of that task.

For its benchmarks, IBM Research used SALSA (SoftwAre Log Structured Array), a host-resident translation layer, to present the SSD storage to applications on the server as a Linux block device. Radian supplies SSD management software to suppliers and OEMs. The test SSD was an RMS-350, a commercially available U.2 NVMe SSD with Radian Memory Systems’ zoned namespace capability and configurable zones.

SALSA controls data placement while the SSD abstracts lower level media management, including geometry and vendor-specific NAND attributes. SALSA also controls garbage collection, including selecting which zones to reclaim and where to relocate valid data, while the SSD carries out other NAND management processes such as wear-levelling in zones.

By residing on the host instead of the device, SALSA enables a global translation layer that can span multiple devices and optimise across them. The FTL of an individual SSD does not have this capability.

IBM and Radian ran fio block-level benchmarks and recorded throughput rising to 301MB/sec from 127MB/sec compared to the same SSD relying solely on its own FTL with no SALSA software. There was a 50-fold improvement in tail latencies (latencies outside 99.99 per cent of IOs) and an estimated 3xvimprovement in flash wear.

A MySQL/SysBench system-level benchmark showed 65 per cent improvement in transactions per sec, 22x improvement in tail latencies and, again, an estimated 3x Flash wear-out improvement.

This is just one demonstration and used a single SSD. A problem with host-managed zoned SSDs is that software on the host has to manage them. This software is not included in server operating systems, and so application code needs modifying to realise the benefits of the zoned SSD. Without a standard and widespread zoned SSD specification, there is no incentive for application suppliers to add zoned SSD management code to their applications.

IBM’s SALSA interposes a zoned SSD abstracting management layer between the zoned SSDs and applications that use them. That means little or no application code has to be changed. You can read the IBM/Radian case study to find out more.

Ideally, SALSA or equivalent functionality should be open-sourced and added to Linux. This would make it easier for applications to use zoned SSDs. A 3x improvement in throughput and wear, together with tail latency and transactions/sec improvements are worthwhile gains. But without easy implementation, the technology will remain niche.