Arm speeds up compute-on-storage with 64-bit Cortex-R

Arm has released the 64-bit, Linux-capable Cortex-R82 processor, designed specifically for compute-on-storage drives.

Update; GPU supplier Nvidia buys Arm. 14 September 2020.

Such drives run stored data task-specific apps, to offload the host and get faster results. Use cases, Arm says, include video transcoding, database acceleration and real-time data analysis.

Neil Werdmuller, Arm’s director of storage solutions, blogs: “Computational storage is emerging as a critical piece of the data storage puzzle because it puts processing power directly on the storage device, giving companies secure, quick and easy access to vital information.”

Werdmuller presenting an SNIA storage controller session (video).

About 85 per cent of hard disk drive and SSD controllers already use ARM processors for size, ecosystem and power-efficienc, the company said today.

The R82 is Arm’s first 64bit Cortex embedded real-time processor and is up to two times faster than the previous Cortex-R8 generation. The chip has up to eight cores, supports 1TB of memory, and gets a memory management Unit (MMU), plus optional ‘NEON’ acceleration.

Cortex-R82 block component diagram

Cortex-R processors have lacked an MMU until now and could not use virtual memory. The MMU combines the physical memory space with space on a storage drive to provide a larger-than-physical memory virtual space. Application DRAM access requests are mapped by the MMU to actual physical memory. This enables the paging of application code from the storage drive into memory, thus increasing the number of applications and their working data set that can run over time.

An R82-based storage controller could run multiple applications instead of a single, dedicated storage IO handling app. For example, in quiet periods it could use idle cores to run data analytics, data transcoding or machine learning code. Coincidentally, Arm says the R82 is 14 times faster than its Cortex-R8 for neural network workloads per cycle.

NEON provides Single Instruction Multiple Data (SIMD) capability, in which a single instruction can work with multiple registers to get more work done. This speeds performance when the same operations need to be performed on multiple data objects as in digital signal processing and graphics work. Arm suggests using NEON for audio and video processing, voice and facial recognition, computer vision and deep learning. 

Arm is catching up here with Intel which applied streaming SIMD extensions in 1999 to its Pentium III processor. Werdmuller blogs that the R82’s features “will allow storage applications to run new workloads like machine learning at a lower latency.” He thinks that IOT edge applications could use storage drives with this capability. 

The Cortex-R82 may be welcomed by computation-on-storage supplier NGD, whose Newport drives use 64-bit Arm Cortex-A53 processors running Ubuntu Linux. Blocks & Files expects a Newport drive using Cortex-R82 processing will emerge.

Update. GPU maker Nvidia is buying Arm Holdings from Softbank for $40bn. Arm will stay headquartered in the UK, at Cambridge. Nvidia CEO Jensen Huang said: “We will expand on this great site and build a world-class AI research facility, supporting developments in healthcare, life sciences, robotics, self-driving cars and other fields.”

Wells Fargo senior analyst Aaron Rakers told subscribers: “We believe the combination of NVIDIA and Arm will leave investors to consider the possibility that this combination could meaningfully reshape the semiconductor landscape over the next decade (especially in terms [of] the data centre market).”

It is likely Nvidia will accelerate Arm’s push into making server chips. We can expect to see a stronger presence of Arm CPUs in storage arrays and HCI systems in the future. This move gives Nvidia a presence in the storage hardware business and provides future competition for Fungible, Pensando and other DPU suppliers.