Home Blog Page 322

Qumulo hybrid filers offer NVMe performance ‘at the price of disk’

Qumulo has added NVMe SSD caching speed to its hybrid scale-out filers.. Customers are also getting new systems that will ‘end forklift upgrades’. They support automatic encryption, and faster software upgrades.

Ben Gitenstein.

Ben Gitenstein, Qumulo’s VP of Product, told us: “We are at the fat middle of the market,” by which he means the scale-out filer sweet spot where mainstream enterprise features are required as well as simplicity.

Gitenstein sees enterprise data centre requirements increasing over time, with things such as Kubernetes support, now required by edge cases, becoming mainstream. Once they are a sweet spot, aka ‘fat middle’, requirement then we can expect Qumulo to add Kubernetes storage support.

The company builds P-Series all NVMe flash filers, C and QC Series SSD cached disk-based filers, and K-series nearline disk archive filers. These clusterable systems all use the same Qumulo Core file system. Qumulo’s software is also available running in AWS.

There are two hybrid filer model designations, the newer C models and the older QC models, with ‘C’ standing for caching. A numeric suffix is the raw storage capacity, as in C72T with 72TB of raw capacity. They have all used SATA SSDs for caching. Now Qumulo has added two new C systems; C192T and C432T, each using six faster NVMe SSDs as the caching medium. The C72T and C168T each use four SATA SSDs.

Qumulo claims they offer NVMe performance at the price of disk, but it has not not provided comparative performance datah. We expect the P-Series to be faster than the C-Series at any similar capacity point. Eg; P184T vs C192T. 

The company’s latest data centre filer spec sheet lists the C72T, C168T, C192T and C432T systems, but does not mention the QC systems, suggesting they are on the way out.

Qumulo Data Centre spec sheet.

Software upgrade

Qumulo today announced various software enhancements.

Dynamic Scale enables the use of newly qualified platforms with the latest processors, memory and storage devices without the need for forklift upgrades, data migrations, or complex storage pool management. The new systems can be added to existing clusters.

Qumulo Secure adds AES 256-bit software encryption at-rest to the existing role-based authentication (RBAC), audit, and encryption in-flight. All data is now encrypted automatically.

Software upgrades to Qumulo Core will complete in under 20 seconds with Instant Upgrade. This automates complete OS upgrades as well. The S3 support announced as Qumulo Shift in June, gets a visual interface.

Qumulo support for NVMe caching is available now with v.0.2. Qumulo Secure is available today in SW release 3.1.5, as is the Shift visual interface in 3.3.0. The C192T and C432T systems will be available to order later this month. Qumulo Dynamic Scale is available on December 15, 2020.

Message in a space capsule: Art works preserved in DNA break free of the biosphere

Art works stored on DNA data storage will be fired into Space in 2022. The pieces by three members of the all-female Beyond Earth collective, are responses to human population growth, consumption and degradation, and preservation of Earth’s biodiversity.

The works are digitised and converted from binary data to the DNA bases represented by the letters A, T, G and C. The encoded DNA sequences are synthesised with California startup Twist Bioscience’s silicon-based platform and preserved in a specialised capsule.

Beyond Earth artworks

Beyond Earth says that DNA is nature’s oldest and most resilient data storage method. No energy or maintenance is required to preserve it, it is ultra-dense and compact, and it lasts hundreds of thousands of years, making it the ultimate time capsule for any digitised artwork.

Beyond Earth’s mission is to explore the frontiers of art, space, and biology through space-bound artworks. To Space, From Earth will endure the test of time, it says, and serve as an important record of human history and the biosphere.

Of course, once in space the artworks will likely constitute write-once-read-never storage – unless they are somehow retrieved and read by our alien overlords.

Computational storage firm NGD touts 12TB ruler SSD

NGD, the computational storage startup, has added a 12TB capacity short ruler-format drive to its lineup.

Computational drives feature an on-board processor, typically an Arm CPU, which performs repetitive and low-level operations on stored data. Typical workloads include compression, encryption, OCR, indexing, search and video transcoding.

An advantage of computational storage drives comes with large datasets. For example, 104 of the new E1.S-format Newports could hold a petabyte of data. An indexing and search operation on 1PB of data by a server CPU, reading the data in from storage, can take many minutes, perhaps more than an hour depending upon the data.

Hand the work over to computational storage, and the 104 Newport drives operate in parallel, no data is moved, the job completes in a shorter time. The server CPU is not tied up doing the work. Free of these duties, the server CPU can run more virtual machines or containerised applications.

Some specs

The E1.S. is an EDSFF (enterprise and data centre SSD Form Factor) format for flash drives that provides greater scalability, flexibility and thermal management than the current disk drive -based U.2 format and M.2 format. Kioxia announced the 4TB XD6 SSD in the E1.S format earlier this month.

NGD offers two flavours of Newport-branded products; the HCS-8100 NVMe SSDs and ICS-8100 computational storage drives with in-situ processors. Until now there have been two ICS products the 8TB M.2 ICS-8100 drive, release in June 2019. This followed the 16TB U.2 ICS-8100 drive which shipped in March 2019, and since expanded to 32TB.

NGD short ruler Newport drive

An NGD table shows where the new ruler format drives sits between its M.2 and U.2 drives in capacity and performance terms.

NGD in-situ processing drive options.

NGD’s E1.S drive has up to 12TB of 3D NAND TLC capacity and is slower than KIoxia’s XD6, although the read-write performance profile is more balanced than Kioxia’s read-skewed drive. A lot of that is due to Kioxia use of the faster PCIe Gen 4 interface rather than PCIe Gen 3, which is used by NGD.

Kioxia XD6 performance.

Although the E1.S Newport is called an up to 12TB drive, its maximum formatted capacity is 9.6TB, considerably less. A 6TB capacity model is formatted down to 4.87TB. Similar capacity reductions occur with the M.2 and U.2 Newport ICS-8100 drives.

NGD tells us that formatted capacity varies with workload. The E1.S’ maximum capacity is 11.52TB with a 7 per cent read-centric OP.

Your occasional storage digest with MRAM, Commvault, Elbencho and more

Let’s start with GPUDirect.

GPUDirect storage benchmark

Elbencho, a new open-source benchmark for blocks and files, is the first tool to support Nvidia’s GPUDirect Storage (GDS). This gives people the possibility to assess performance for I/O-intensive GPU workloads by using GPUDirect – for streaming, random access in large files or lots of small files or when accessing a block device directly. Elbencho supports all these modes.

Support for GDS was announced during the current public beta/preview phase of GDS after official approval by Nvidia.

Elbencho was created as an easy-to-use unified replacement for several other storage benchmark tools like fio (focus primarily on block storage), ior/iozone (focus primarily on streaming and random IO in large files) and mdtest (focus primarily on lots of small files). Elbencho can handle all these cases and show live statistics to see how stable the throughput of a storage system is for certain access patterns,.

It shows min/avg/max latency of operations, can include GPUs in the benchmarks by transferring the reads or writes to/from GPU memory via CUDA or GDS, and also supports distributed benchmarks from multiple clients for shared storage systems.

Sven Breuner.

The elbencho test was created by Sven Breuner, Field CTO at Excelero, creator of the parallel file system BeeGFS and co-founder of ThinkParQ, the company behind BeeGFS.

Elbencho is available for download from github

Tachyum nears Prodigy tape-out

Tachyum, the developer of the AI-optimised Prodigy processor, has sent its hardware emulation design to a manufacturer to build FPGA emulation boards prior to chip tape-out. Prodigy chips will run legacy x86, Arm, and RISC-V binaries and outperform the fastest Xeon server CPUs at a 10x lower power consumption, according to Tachyum.

Tachyum Prodigy processor

They will beat Nvidia’s fastest GPUs on HPC, AI training and inferencin, according to the startup which claims Prodigy is the only chip that can switch between AI, and HPC workloads. This means more overall efficiency for hyperscale cloud providers. 

EMEA goes Metallic

Commvault has launched Metallic data protection SaaS in many EMEA countries including the UK, Belgium, Denmark, Finland, Ireland, Israel, Italy, Luxembourg, the Netherlands and Sweden. Metallic supports customers’ data sovereignty and General Data Protection Regulation (GDPR) compliance efforts. It can integrate Azure’s secure and compliance platform offering and tools to help address GDPR compliance, and is available through the Microsoft Azure marketplace.

Everspin’s MRAM selling results

MRAM developer Everspin has reported results for its third quarter ended September 30, 2020. Revenues increased 10.3 per cent Y/Y to $10.1m with a loss of $3.9m, up $200K from a year-ago. MRAM (Magnetic-resistive RAM) is byte-addressable, non-volatile memory with SRAM (Static RAM) and DRAM-class speed. It is claimed to be faster than competing technologies, such as 3D XPoint, but is more expensive to manufacture. This makes it compare badly on price-performance grounds and MRAM take-up is slow. 

Shorts

Actian has proclaimed its Avalanche hybrid cloud data warehouse out-performed top competitors on speed and cost-performance in a recent GigaOm benchmark report. Avalanche beat out Snowflake, Amazon Redshift, Azure Synapse, and Google BigQuery in performance. Actian was found to be 8.5X faster than Snowflake and a fraction of the cost.

Datadobi was used by Computex Technology Solutions to migrate the data of a documentation management system (DMS) application for one of the largest financial and professional services providers in the world. It needed to migrate 100TB+ of data – from which a majority of the firm’s revenue was derived – to Dell EMC Isilon. The project took 10 weeks instead of a competitor’s 2-year estimate.

Cloud data lake analysis company Dremio has announced new software delivering sub-second query response times directly on cloud data lakes and support for thousands of concurrent users and queries. Dremio now integrates with with Microsoft’s Power BI data visualisation product. Users can launch BI  from Dremio and start querying data.

Odaseva, a France-based data protection services provider for big Salesforce customers, has raised $25M in a Series B round led by Eight Roads Ventures.

In-memory database supplier MemSQL has rebranded as SingleStore. The company raised $50m in debt financing in May.

Cloud file services supplier Nasuni is now a Microsoft Windows Virtual Desktop Partner. Microsoft has validated that Nasuni’s file system for the cloud passed its technical certification process and would easily integrate with Windows Virtual Desktop.

Next Pathway has announced a self-service version of its SHIFT Migration Suite. This automatically migrates code from legacy sources, including Apache Hadoop, Teradata, Netezza, Greenplum, and SQL Server; and ETL (Extract, Transform, Load) tools like Informatica and DataStage, to various cloud targets.

Panasas says The University of Texas at Dallas School of Arts, Technology, and Emerging Communications (ATEC), has deployed 740TB of ActiveStor storage. ActivStor supports direct and parallel data flow to 24 Red Hat Enterprise Linux blades at the ATEC Animation and Games render farm. At 100 per cent farm capacity, the render software can send renders for processing to Windows machines in ATEC classrooms.

Compute-on-storage startup Pliops will be at the virtual Flash MemorySummit  and demo how the Pliops Storage Processor enables RAID 5 protection 3x faster than RAID 0, and promote low-cost QLC drives to enterprise-class use.

Pliops storage processor

PoINT Software & Systems has released PoINT Data Replicator, which lets users replicate both files and objects in an S3 object or cloud storage system. It replicates file systems in an S3 bucket, using the file path as an object key so that standard S3 browsers display the original directory structure once replication is complete.

The British Antarctic Survey (BAS) is using Quantum Scalar i3 tape libraries on the polar ship RRS Sir David Attenborough to back up scientific data collected at the Earth’s poles. Tape cartridges are carried by ship crew members on flights back to the organisation’s UK office.

Quest Software has announced enhancements to QoreStor and SharePlex. QoreStor adds “Object Direct,” an archive Tier allowing backup data to be sent from a QoreStor instance directly to a lower cost AWS Glacier storage. SharePlex can now can move Oracle data to Kafka for data streaming or SQL Server for offload reporting in near real time. 

Israeli startup Replix provides real-time data replication-as-a-service across geographical distances to multiple public cloud targets. It is a mesh network of data relays transporting data between cloud locations in real-time. The fabric spans across all public cloud regions, eliminating the operational complexity associated with setting up and maintaining data replication infrastructure.

Software-defined cloud storage supplier RSTOR offers RSTOR Space, a platform-as-a-service (PaaS) S3 storage layer agnostic to cloud storage suppliers, and  optimised for storage efficiency and network throughput. Its decentralised cloud architecture uses geo-distributed erasure coding and a 100Gbit/s infrastructure. It will now support Spectrum Scale 5.1.

Tape and object data protector Spectra Logic has updated its StorCycle file lifecycle management software to add Scheduled Delete, Delete Migrate/Store Projects, Improved Support of Customer Symlinks,  Rerun Existing Jobs, CIFS/SMB Support with Linux, Restore Jobs Processed First, and Background Database Indexing. 

Life after Nvidia: Mellanox founder and CEO quits

Eyal Waldman

Eyal Waldman, Mellanox CEO, is leaving the networking equipment company he founded, six months after its $7bn acquisition by Nvidia.

Waldman, who pocketed an estimated $270m, told Globes: “When we closed the deal, I already knew that I was leaving. You build a company over 21 years and make all the decisions and you don’t want to be number two.”

Waldman is an outspoken entrepreneur, know for his “left-leaning political views”. For example, last month the company hired 100 outsourced Palestinian sub-contractors as salaried employees. Waldman has also opposed an Israeli government proposal to annex parts of the West Bank.

Waldman video

And life after Mellanox? When asked last year what he would do when the Nvidia deal completed, he replied: “I’ll continue dabbling in philanthropy, but it will not be my main business. I also have a private life—isn’t that okay? I received a lot of offers now, and I’ve turned them all down.”

Here is an offer he has now accepted. Waldman will join the board of Check Point Software, subject to shareholder approval.

Pure Storage hires new sales boss

Paul Mountford, Pure Storage’s sales chief, is leaving after a year at the company. His replacement, Dominick Delfino, a VMware and Cisco veteran, starts immediately.

Before you go thinking the company is in any trouble, Pure has sugared the announcement by releasing preliminary Q3 revenue figures of $410m – better than analyst consensus expectations of $404m.

The news certainly pleased William Blair analyst Jason Ade, who told subscribers: “Overall, we think the positive pre-announcement is indicative of solid execution, best-in-class offerings, and continued share gains.”

Dominick Delfino

Delfino most recently ran VMware America’s sales operations and brings his “proven ability and success in as-a-service business models to Pure”. As you might expect, his job is to ramp up sales performance.

Mountford, leaves at the end of Pure’s fiscal 2021 year in February. He helped recruit Delfino as CRO and will help establish him in the role, Charles Giancarlo, Pure’s Chairman and CEO, said: “I thank Paul for his significant contributions, his partnership, his leadership, and for his ongoing assistance in a smooth transition.” 

The all-flash array business announces formal results for the third quarter on November 24.

SK hynix: If you want to make data centres cleaner, junk the disk drives and buy SSDs instead

SSD maker SK hynix is calling on companies to do their bit for climate change – by replacing disk drives with SSDs.

In an earnings call yesterday discussing fiscal Q3 results, CEO Lee Seok-Hee said it is “notable that the general SSDs and low power consumption SSDs consume 50 per cent and 94 per cent less power than HDDs respectively. Consequently, if all HDD storage in global data centres was replaced by low power consumption SSDs, we can eliminate 41 million tons of carbon dioxide emissions.

Lee said:”This will lead to approximately more than 4.2 trillion won of social value. The company will try to realise such social value by driving the conversion towards SSD in the data storage market.”

“For data that never stops being generated, the world’s data centre storage capacity has to grow rapidly to as much as 5.1 billion terabytes in 2030, which is a 5.7 times growth in 10 years, “Lee said.The SSD with outstanding performance in speed and power consumption will grow to about mid-40 per cent, and most SSDs will be substituted from TLC to QLC or PLC  that offer better cost per bit.”

Lee Seok-Hee

SK hynix is joining RE100, a global initiative bringing together the world’s businesses committed to 100 per cent renewable electricity, and “aims to ensure all of its power consumption is generated by renewable energy by 2050.”

Lee said: “The dramatic climate change is the problem which impacts not only companies’ economic value but also the survival of human race.”

RE100 was established in 2014 by the Climate Group and 263 companies have joined the initiative. 

The Intel deal

Lee said in the earnings call that SK Hynix was a late starter in the NAND business. It is buying Intel’s 3D NAND foundry and SSD business for $9bn to get a leg up, gaining a product portfolio, SSD technology and manufacturing scale. The company aims triple NAND revenues over five years with the acquisition.

Lee said Intel is “particularly competitive in the data centre SSD market. It is leading the standardisation of PCIe interface [and] has strong firmware and controller technology as well as industry-leading QLC technology.” 

These strengths complement SK hynix strengths in 3D layering, in-house controllers and mobile NAND applications. With Intel’s NAND business, the company “will be able to extend our business opportunities into all areas of NAND.”

SK hynix revenues in the quarter ended 30 September, were Kwon 8.23tn ($7.2bn), up 19 per cent Y/Y, with a profit of Kwon 1.1tn ($950m), up 118 per cent Y/Y. DRAM chips represented 72 per cent of the revenues and NAND accounted for the rest.

Kioxia XD6 ruler SSD hits the hyperscaler scene

Kioxia has announced the XD6, its first ruler format SSD, which is intended for hyperscalers and complies with the Open Compute Project (OCP) NVMe Cloud SSD Specification.

Ross Stenfort, a hardware storage engineer at Facebook, has upplied a quote: “EDSFF E1.S is the next generation of flash form factors, delivering superior thermals, performance, serviceability, and scalability when compared to current solutions. KIOXIA’s support of EDSFF is a great step forward and lays the groundwork for the future.”

Neville Ichhaporia, a Kioxia America senior marketing and product management director, said in a statement: “Hyperscale data centres are the heart of the internet, and the OCP platform will power future generations with SSDs optimised for server platforms.”  

Seven XD6 drives, with 25mm heatsink, in a carrier.

The XD6 is a short ruler-sized drive, using the E1.S form factor. This measures 111.49mm long by 31.5mm wide. But Kioxia’s drive has three width options; 9.5mm, 15mm and 25mm, due to varying heatsink sizes.

E1.S drives promises more density, performance, reliability, and better thermal management than the M.2 gumstick format. However, the XD6 has the same density as Kioxia’s prior M.2 format XD5 drive 1.92TB and 3.8TB, even though it uses denser 96-layer 3D NAND rather than the XD5’s 64-layer flash. But it has a PCIe Gen 4 x 4 lane interface instead of the XD5’s PCIe Gen 3 x4, and goes a heck of a lot faster;  

Kioxia XD5 and XD6 performance summary.

The drive, like the XD5, is heavily read-optimised, with the sequential read performance up by 2.5X and random reads 3.5X faster. It has 2 million hour MTTF rating, and sustains one full drive write a day for its five-year warranty period. The XD6 supports SED and TCG-Opal 2.0 security standards.

Samsung introduced a PM9A3 E1.S format drive in May, using 128-layer 3D NAND, with 960GB to 7.68TB capacity range. It delivers 900,000/180,000 random read/write IOPS, up to 6.5GB/sec sequential reads and 3.5GB/sec sequential writes.

That makes Samsung’s drive faster at sequential writes, and random reads and writes, than Kioxia’s XD6, as well as having a higher capacity. And it supports 1.3 drive writes a day, but only for three years.

Kioxia’s XD6 drives are sampling to select customers.

WekaIO plugs into Amazon S3 object storage

Amazon S3 logo

WekaIO is the latest vendor plugging the Amazon S3 API into its filesystem.

The idea is to combine fast file access through WekaFS parallel access to SSDs with a bulk capacity option on cheaper disk-based object storage. Hot data goes into flash and cooler data heads for the disk drives. These can be deployed in on-premises object storage, or in the AWS cloud and in IBM Cloud Object Storage.

WekaIO has set up certified partnerships with AWS, Cloudian, IBM, Hitachi Vantara, Quantum and Scality. Customers can use these architectures to add an object storage capacity tier to their WekaFS installation. The object storage can be in the public cloud or on-premises. Object metadata is kept in flash to speed that aspect of object data access. 

Customers see a single global WekaFS namespace in which there can be an object store accessed through WekaFS. Their existing Weka workflows don’t have to change to accommodate the object store.

Weka has also added a snap-to-object feature. This is a point in time copy of the entire file and object namespace, captured with a single click and pushed to an object store. From there it can be moved to a remote object namespace, providing instant hybrid workflows, disaster recovery, and test/dev capabilities. The snaphots are immutable and encrypted.

Table stakes

A tidal wave of file storage suppliers has embraced Amazon’s S3 Rest API to gain access to object storage. Participants include Dell with PowerScale, IBM (Spectrum Scale), Igneous, NetApp with ONTAP, Pure Storage with both FlashArray (CloudSnap) and FlashBlade, Qumulo, and Quobyte. Only Panasas prefers to use its own object storage with an iSCSI definition and not S3.

Object storage suppliers, such as Cloudian, Hitachi Vantara (HCP), IBM (COS) Quantum (ActiveScale), Seagate (CORTX), Scality (RING) and others have all added S3 support to their object storage.

Linbit builds Kubernetes on-ramp for WD OpenFlex

Western Digital’s composable OpenFlex flash storage system now supports Kubernetes storage, courtesy of Linbit’s LINSTOR software.

OpenFlex is a physical chassis, containing SSDs or disk drives, which is addressed as an NVMe target device. It’s basically an NVMe-oF JBOD (Just a bunch of drives) and needs additional software to link it to containerised environments.

Manfred Berger, WD’s senior manager for business development, platforms, said in a statement: “With Linbit’s LINSTOR software added to our OpenFlex offering, the software-defined-storage solution combines the advantages of SDS systems, Linux OS features and composable hardware so that organisations have the confidence they need in their Kubernetes environments.” 

Philipp Reisner, Linbit CEO, said: “With the open-source LINSTOR we bridge the gap between the workload orchestrator (Kubernetes) and the efficient OpenFlex storage devices from Western Digital. In combination delivering high-performance block storage at a very attractive price point.”

B&F LINSTOR-OpenFlex-Kubernetes diagram.

LINSTOR is configuration management system for storage on Linux systems. The software use the Linux LVM tool to manage logical volumes and/or ZFS ZVOLs on a cluster of nodes. Linbit has developed a Distributed Replicated Block Device (DRBD) storage construct for Linux and LINSTOR uses this to provide block storage devices to users and applications. 

LINSTOR sees WD’s OpenFlex Composable Infrastructure as an NVMe-accessed storage pool from which it can allocate space to DRBD. OpenFlex has a REST Open Composable API which is used by LINSTOR to do this.

That’s the downstream aspect sorted. On the upstream side there is a LINSTOR Operator to deliver DRBD capacity to Kubernetes. It installs DRBD in Kubernetes environments, thus linking OpenFlex and Kubernetes, and manages Kubernetes satellite and controller pods.


Virtana gets new CEO and CFO after new CMO, COO, chairman and funding

Former Dell exec Kash Shaikh has joined Virtana as CEO, And ex-Hazelcast CFO Marion Smith has replaced longterm CFO Peter Dayton.

The appointments are part of an overhaul of the C-suite at the application and infrastructure monitoring company, conducted by exec chairman Ron Sege, who joined the company in April.

Ron Sege

Sege said Shaikh is an “executive with a proven track record of business transformation in data centre and cloud markets, Kash is the right person to lead Virtana as we help our Global 2000 customers and their partners use our high-fidelity data sets to plan and optimise their hybrid cloud migrations for cost, capacity, and performance.”

“He has hands-on experience in every aspect of the business necessary to our success and is known for building and motivating great teams.”

Shaikh was Global VP and GM for Dell Technologies’ Enterprise Infrastructure Solutions business. His CV includes a Global Marketing and Business Development VP role at Ruckus Wireless, which was acquired by Brocade. If Shaikh brings a Dell mindset to his new role we can expect the channel to play a large part in Virtana’s future.

Backstory

Prior CEO and president Philippe Vincent left Virtana in April, with no succession plan. Virtana got its new exec chairman, Ron Sege, who took over as interim CEO. Just before his appointment Virtana also replaced CMO Len Rosenthal with Scott Leatherman from Interana, and Lisa Alger was promoted from SVP Engineering and Development to COO.

In May Virtana announced a $15m funding round from existing investors HighBar Partners and Benhamou Global Ventures. John Kim, Managing Partner at HighBar, said: “Virtana is poised for worldwide growth in the hybrid cloud infrastructure optimisation market with this new investment and the guidance of Ron Sege.”

The most recent funding event before this $15m infusion was $10m debt financing six years ago, in 2014. 

In eight months Virtana has had a new chairman, CEO, CFO, CMO and COO, and secured $15m in funding. Vincent resigned in the middle of the pandemic, with Sege coming on board to spearhead growth.

We interpret this as indicating that Virtana’s business was heavily affected by Covid-19. The investors decided the exec roster needed overhauling and the company needed extra funding as well. So the CEO was effectively resigned. Now Virtana is focusing on using the public cloud to serve socially-distanced customers and help them lower CAPEX and OPEX. That’s the route it sees to getting growing again.

WekaIO races out of the blocks in GPUDirect storage race

WekaIO has chalked up some benchmarks that show it is faster than VAST Data when delivering data to Nvidia’s DGX-2 GPUs server via GPUDirect. Dual-port cards and better software probably accounted for the difference.

Update; VAST Data used InfiniBand and not Ethernet in its testing. 4 Nov 2020.

Let’s recap a brief explanation of GPUDirect, introduced by Nvidia to the world in July. This speeds up the processing of AI and analytics workloads by ensuring Nvidia DGX-2 GPU servers are not left idle by slow data delivery from storage.

GPUDirect technology bypasses a server memory bottleneck by enabling DMA (direct memory access) between GPU memory and NVMe storage drives. It enables the storage system NIC to talk directly to the GPU, avoiding the DGX-2’s CPU and memory subsystem.

Customers use Nvidia’s GPUs to accelerate graphics-related, AI, and ML workloads which can take many hours, or even days using x86 processors. Such GPU servers can cost millions of dollars and having them idling while waiting for data is far from ideal. Hence the need for GPUdirect and the focus on sheer data delivery speed by storage suppliers.

Four high-end storage suppliers have declared their support for GPUDirect: DDN, Excelero, VAST Data and WekaIO. VAST and WekaIO have published performance benchmarks with WekaIO’s 97.9GB/sec beating VAST Data’s 92.6GB/sec. Excelero and DDN are yet to publish results.

There is no mention of price/performance in any of the GPUDirect benchmark comparisons. These are not formal industry benchmarks, like the SPC-1 suite, where price/performance is a vital measure and the players are focussed on GB/sec to the excusing of $/GB/sec.

Apples don’t equal apples

VAST Data suggests that comparing its GPUdirect result with Weka’s is not comparing apples to apples, as the two companies used different test scenarios.

WekaIO, working with Microsoft Research, used a “single NVIDIA DGX-2 server connected to a WekaFS cluster over a Mellanox InfiniBand switch the testers were able to achieve 97.9GB/s of throughput to the 16 NVIDIA A100 GPUs using GPUDirect Storage.”

In test, VAST Data used NFS over RDMA, across InfiniBand, with GPUDirect to achieve its 92.6GB/sec data delivery to a DGX-2.

Nvidia’s DGX-2 has 8 x 100Gbit/s interfaces. A VAST Data source, speaking unofficially, said that 8 x 100Gbit/s comes out at 100GB/sec, a fully saturated line rate. That bandwidth has to carry the data payload but some bandwidth percentage will be lost due to link overhead.

That means 8 x 100Gbit/s interfaces do not actually carry a 100GB/sec data payload. VAST Data delivered 92.6 per cent of 100GB/sec, and claims the links are essentially saturated at that rate. They cannot carry any more data.

How was WekaIO then able to deliver 97.9GB/sec?

Our VAST source’s best guess is that Microsoft Research is reporting some NIC to CPU bandwidth that Nvidia doesn’t, since all Nvidia cares about is data delivery to its GPUs. If that view is correct, WekaIO and VAST are  driving the network equally hard but WekaIO is reporting with some small NIC-to-CPU bandwidth slice and VAST doesn’t measure this. 

Weka told us it wasn’t reporting any such NIC-to-CPU bandwidth.

Other possibilities

Our VAST source mentioned another possible get-out; you have to make sure no one ever uses GiB to mean GB, a they are 7 per cent bigger. A GiB or gibibyte is 1024 Mib which is 1024 bytes. A GB or gigabyte is 1,000 MB which is 1,000 bytes.

On that score Weka would be reporting GiB/sec whilst VAST is reporting GB/sec. B&F thinks this is highly unlikely, and WekaIO confirmed it’s using GB and not GiB in its reporting.

Excelero’s Sven Breuner mentions another potential complication; “There are different models of the DGX-2. They [WekaIO] were likely using a model that had 9 x 100Gbit/s NICs, namely the DGX-2H, which the people typically also just refer to as “DGX-2″.”

Nope, WekaIO told us it used a standard, 8 x GPU, DGX-2. It also said it used dual-port interface cards and that could well have contributed to Weka’s faster result. As could the simple possibility that WekaIO’s filesystem is faster than VAST Data’s software.