Pavilion compares RoCE and TCP NVMe over Fabrics performance

Pavilion Data says NVMe over Fabrics using TCP adds less than 100µs latency to RDMA RoCE and is usable at data centre scale.

It is an NVMe-over-Fabrics (NVMe-oF) flash array pioneer and is already supporting simultaneous RoCE and TCP NVMe-oF transports.

Head of Products Jeff Sosa told B&F: “We are … supporting NVMe-over-TCP.  The NVMe-over-TCP standard is ready to be ratified any time now, and is expected to be before the end of the year.

“We actually have a customer who is deploying both NVMe-oF with RoCE and TCP from one of our arrays simultaneously.”

Pavilion says NVMe-oF provides the performance of DAS, with the operational benefits of SAN. It’s implementation has full HA and no single point of failure. It says it offloads host processing with centralised data management.

That data management, when used for MongoDB for example, allows;

  • Centralized Storage allows writeable clones to be instantly presented to secondary hosts, avoiding copying data over the network
  • Dynamically increase disk space size on-demand in any host
  • Instantly back up the entire cluster using high-speed snapshots
  • Rapidly deploy a copy of the entire cluster for Test/Dev/QA by using Clones
  • Eliminate the need for log forwarding by having each node write log data directly to a shared storage location
  • Orchestrate and automate all operations using Pavilion REST APIs

Pavilion compared NVMe-oF performance over RoCE and TCP with from 1 to 20 client accessors, and found average TCP latency was 183µs and RoCE’s 107µs, TCP being 71 per cent slower.

A Pavilion customer NVMe-oF TCP deployment was data centre rather than rack scale with up to 6 switch-hops between clients and storage. It was focused on random write-latency serving 1,000s of 10GbitE non-RDMA (NVMe-oF over TCP) clients and a few dozen 25GbitE RDMA (NVME-oF with RoCE) clients.

The equipment included Mellanox QSA28 adapters enabling 10/25GbitE breakouts with optical fibre. There were 4 x switch ports consumed to connect 16 x Array/Target ports physically cabled. Eight ports were dedicated to RDMA and 8 dedicated to NVMeTCP; both options equally “on the menu.”

There were no speed-transitions between the storage array and 10GbitE or 25GbitE clients with a reduced risk of over-whelming port-buffers.

Early results put NVMeTCP (~200µs) at twice that of RoCEv2 (~100µs) but half that of NVMe-backed iSCSI (~400µs). On-going experimentation and tuning is pushing these numbers, including iSCSI, lower.

It produced a table indicating how RoCE and TCP NVMe-oF strengths differed;

Suppliers supporting NVMe-oF using TCP as well as Pavilion include Lightbits, Solarflare and Toshiba (Kumoscale.) Will we see other NVMe-oF startups and mainstream storage array suppliers supporting TCP as an NVMe-oF transport? There are no signs yet but it would seem an easy enough (relatively) technology to adopt.

Pavilion’s message here is basically, unless you need the absolute lowest possible access latency, then deploying NVMe-oF using standard Ethernet looks quite feasible and more affordable than alternative NVMe-oF transports – unless perhaps you run NVMe-oF over Fibre Channel.

B&F wonders how FC and TCP compare as transports for NVMe-oF. If any supplier knows please do get in touch. B&F