Quobyte brings GPU-converged storage to AI clusters

Published tue 16 Dec 2025 // 07:05 UTC

Quobyte has introduced GPU-converged storage to bring data in its parallel file system and object storage software closer to GPUs, and scale as GPU servers are added.

The idea is to simplify and speed data storage for GPU servers by using a GPU server’s existing drives and clustering them, converging them, into a shared pool.

BANDF AD

Saurabh Kumar, Quobyte’s head of marketing, writes: “GPU servers contain far more than just GPUs. Each node carries powerful CPUs, large pools of RAM, and fast local NVMe. Yet, in most environments, these resources sit underutilized. At scale, that idle capacity becomes one of the largest hidden inefficiencies in modern AI clusters. GPU-converged storage offers a way to turn this underutilization into value.”

Before hyperconverged infrastructures (HCI) appliance systems arrived, storage was provided to a group of application servers by external storage arrays across a network link. For example, a storage area network (SAN) provided block storage to application servers. HCI changed this model by having a cluster of virtualized application servers. VMware ones, for example, use their local storage drives instead, creating a virtual SAN (vSAN). This did away with the external array and its network links, scaled readily as new application servers were added, and lowered costs.

Quobyte is applying the same idea to GPU servers. These typically have two kinds of processors. There are the GPUs, typically with dedicated high-bandwidth memory (HBM), to run workloads needing highly parallel processing routines, plus a CPU with its DRAM to act as a host processor interfacing the GPU server to the network of systems in which it operates. There are two separate memories: the GPU’s HBM and the CPU’s DRAM.

The GPU server also has local storage drives, generally NVMe SSDs, managed by this CPU, typically an x86 processor. Quobyte’s GPU-converged storage turns these local drives into a storage pool that extends across a cluster of GPU servers. Data in this pool is fed into a CPU’s DRAM and from there transferred as needed into a GPU’s HBM, with high speed and low latency, much lower than if the data was transferred from an external array. Quobyte’s prefetch algorithms work to its advantage here.

BANDF AD

Kumar says: “By running storage on the GPU nodes themselves and using their surplus CPU and flash, organizations can reduce cost, power use, appliances, switch ports, and overall infrastructure complexity.”

He cites a power cost saving example: “In a fleet of roughly 10,000 GPU nodes where CPUs average around 30 percent utilization, the unused 70 percent still draws most of its power budget. Even a conservative estimate of 200 watts of idle CPU power per node, multiplied across thousands of nodes running 24/7, can exceed $250 million per year in electricity. These are CPUs that are powered, cooled, and paid for but rarely used to their full potential. GPU-converged storage turns that waste into productive infrastructure.”

Our understanding of memory and storage tiers. GPU HBM is the top and fastest tier. CPU DRAM is the next, followed by the GPU server’s local SSDs. These are pooled by Quobyte’s GPU converged storage software. External networked storage is the final, cold tier amd placement policies in Quobyte’s SW can move data between it and the GPU converged storage tier.

Quobyte notes that GPU servers are not bulletproof and often go out of service. Kumar writes: “GPU nodes do not behave like a typical storage server or appliance. They reboot frequently for updates, run bleeding-edge kernels and software versions, and are often removed or replaced when faulty.”

The company’s fault-tolerant file system software provides cover for such outages. It assumes hardware will fail and ensures that data integrity and availability are maintained when a node outage occurs. GPU-converged storage is kept available through node reboots and failures.

BANDF AD

Quobyte says GPU-converged storage lowers overall costs as it “uses the spare CPU, RAM, and NVMe inside your GPU nodes to lower infrastructure spend and power consumption without adding new hardware.” Each added GPU node contributes storage capacity and throughput automatically. There is no separate storage tier to size, deploy, or scale independently. Kumar says this “aligns with the economics of modern AI: scale is accelerating, but power and space are not.”

Interested parties can request access to Quobyte’s GPU-converged storage here.

ai-ml object nvme nvidia flash file quobyte

Quobyte brings GPU-converged storage to AI clusters

Storage news ticker – March 2

AI server frenzy fuels record revenues for Dell

All-flash array topline boost puts NetApp on track to strongest year yet

Everpure tops $1B quarter as FY 26 revenue hits $3.7B

Nutanix beats the Street on revenue, lands $150M AMD AI alliance

VAST broadens AI platform push with Nvidia tie-up and control plane

Index Engines flags rise of polymorphic, shadow-encrypting ransomware

Commvault plugs AI anomaly alerts into CrowdStrike Falcon SIEM

Scality RING becomes back-end object store for WEKA NeuralMesh

Druva adds Agentic Memory to speed forensic compliance probes

Backblaze lands first eight-figure neocloud deal as revenue climbs 12%

Backblaze brings backend B2 Neo cloud storage to GPU server farms

HDD is back: The return of the hard drive

Storage news ticker – February 23

How AI Is forcing storage back into the enterprise conversation

StorONE arrays adopt external flash JBODs in flash program

Flamethrower from Backblaze to fire up startup cloud storage

Quantum results show green shoots as tape sales double

Platters: WD new disk drive tech hits lucky 14

Court dismisses NetApp complaint against ex-CTO now at VAST, but NetApp is appealing

Suite Studios cuts out proprietary file formats with S3-native streaming

Dell VP says discrete beats disaggregated storage for AI