SPUs, Arm CPUs and FPGAs: SoftIron talks hardware acceleration

SoftIron, the startup storage supplier, use Arm processors instead of x86 CPUs for controllers in its Ceph-based HyperDrive. This provides scale-out storage nodes behind a storage router front-end box which processes the Ceph storage requests. It also has an Accepherator FPGA-based erasure coding speed-up card.

In this email interview SoftIron co-founder and CEO Phil Straw explains that Arm-powered storage is not the be all and end all. Indeed, FPGA acceleration can go further than Arm CPUs.

Phil Straw

Blocks & Files: What did you learn by using Arm?

Phil Straw: “What we learned with ARM is that the core itself is not magic in and of itself but it does have things you can lean on over and above x86. ARM is just a hardware executor of sequence, selection and repetition…and in many ways it is weaker than x86 (often). It is low power and has a different IO architecture. 

“We took ARM and made it an IO engine that does compute (to serve storage); which makes an awesome storage product in one category. That is to say we did not take a computer and make it a storage product. That’s what we really discovered. ARM for us is low power and awesome at spinning (disk) and hybrid (flash+disk) media.

Blocks & Files: That’s the disk and disk-flash area. What do you use with all-flash storage?

Phil Straw: “We are also agnostic about technology and we are now servicing SSD/NVMe with x86 and doing the same things (not building computers but coming low level and leveraging AMD Epyc as a storage-from-birth architecture)…the results are also spectacular. More soon.

Blocks & Files: Tell us about host server CPU offload.

Phil Straw: “I think what is interesting here is not the ARM versus xxx debate or ARM being in any way a magic sauce. What is interesting is that storage and SDS in particular can have the potential to need processing. Adding extra tasks outside of the main processing does provide advantages. 

“We at SoftIron have already demonstrated erasure coding inline with the network-attach. More to come there too because the advantages are there to be had. The extra processing can be useful either inline in the network or just as an adjunct to main processing. What is interesting is the idea of hand off, parallelism and avoiding dilution of input and output to storage and network.

Blocks & Files: Why are X86 server CPUs ill-suited to this kind of work?

Phil Straw: “Often when you take a computer the I/O paths are not optimised and by that I mean at all the levels in the stack. Firmware in the configuration of the processor, NUMA, UEFI/BIOS, kernel, drivers and the storage stack. Also in hardware as the chip is connected and used. As a result the storage throughput can be as much a half diluted by this phenomenon (SoftIron empirical test data, proven in x86 and ARM design) or, said another way, doubled by doing it custom for storage. 

Blocks & Files: Is such optimisation all that’s needed?

Phil Straw: “For us the biggest yield has not been in extra compute once the bottom up design is for storage (and not a computer first) but the laws of compute and physics do always benefit from parallelism. They almost have to always. For this reason we use FPGA’s and have replicated hardware that sits inline with the network. This allows choke point compute that would be serialised in a processor to be handed to a highly parallel engine. The ARM processor network path is similar but different. We tried this path before we ended up using FPGAs for storage acceleration…for these reasons above.


Softiron is one of three startups that are using ARM processors for their storage controllers – the others are Nebulon and Pliops.

Nebulon is shipping a storage processing unit (SPU) to control aggregated storage in servers.

Pliops is developing an SPU that functions as a key-value (KV) based storage hardware accelerator, sitting in front of flash arrays, that speeds storage-related app processing and offloads the host server CPU. It works with block storage and key:value APIs and says its Pliops Storage Processor accelerates inefficient software functions to optimise management of data persistence and indexing tasks for transactional databases, real-time analytics, edge applications, and software-defined storage (SDS).

The idea of adding hardware-acceleration to storage related processing as a way of both speeding storage work and offloading host server CPUs emerged in 2020. It was part of the overall data processing unit (DPU) concept, with specific hardware used to accelerate storage, security, network and system composing workloads.

Blocks & Files thinks we will hear much more about DPUs and SPUs in 2021.