AI/ML

Quobyte brings GPU-converged storage to AI clusters

Published

Quobyte has introduced GPU-converged storage to bring data in its parallel file system and object storage software closer to GPUs, and scale as GPU servers are added.

The idea is to simplify and speed data storage for GPU servers by using a GPU server’s existing drives and clustering them, converging them, into a shared pool.

Saurabh Kumar

Saurabh Kumar, Quobyte’s head of marketing, writes: “GPU servers contain far more than just GPUs. Each node carries powerful CPUs, large pools of RAM, and fast local NVMe. Yet, in most environments, these resources sit underutilized. At scale, that idle capacity becomes one of the largest hidden inefficiencies in modern AI clusters. GPU-converged storage offers a way to turn this underutilization into value.”

Before hyperconverged infrastructures (HCI) appliance systems arrived, storage was provided to a group of application servers by external storage arrays across a network link. For example, a storage area network (SAN) provided block storage to application servers. HCI changed this model by having a cluster of virtualized application servers. VMware ones, for example, use their local storage drives instead, creating a virtual SAN (vSAN). This did away with the external array and its network links, scaled readily as new application servers were added, and lowered costs.

Quobyte is applying the same idea to GPU servers. These typically have two kinds of processors. There are the GPUs, typically with dedicated high-bandwidth memory (HBM), to run workloads needing highly parallel processing routines, plus a CPU with its DRAM to act as a host processor interfacing the GPU server to the network of systems in which it operates. There are two separate memories: the GPU’s HBM and the CPU’s DRAM.

The GPU server also has local storage drives, generally NVMe SSDs, managed by this CPU, typically an x86 processor. Quobyte’s GPU-converged storage turns these local drives into a storage pool that extends across a cluster of GPU servers. Data in this pool is fed into a CPU’s DRAM and from there transferred as needed into a GPU’s HBM, with high speed and low latency, much lower than if the data was transferred from an external array. Quobyte’s prefetch algorithms work to its advantage here.

Kumar says: “By running storage on the GPU nodes themselves and using their surplus CPU and flash, organizations can reduce cost, power use, appliances, switch ports, and overall infrastructure complexity.”

He cites a power cost saving example: “In a fleet of roughly 10,000 GPU nodes where CPUs average around 30 percent utilization, the unused 70 percent still draws most of its power budget. Even a conservative estimate of 200 watts of idle CPU power per node, multiplied across thousands of nodes running 24/7, can exceed $250 million per year in electricity. These are CPUs that are powered, cooled, and paid for but rarely used to their full potential. GPU-converged storage turns that waste into productive infrastructure.”

Our understanding of memory and storage tiers. GPU HBM is the top and fastest tier. CPU DRAM is the next, followed by the GPU server’s local SSDs. These are pooled by Quobyte’s GPU converged storage software. External networked storage is the final, cold tier amd placement policies in Quobyte’s SW can move data between it and the GPU converged storage tier.

Quobyte notes that GPU servers are not bulletproof and often go out of service. Kumar writes: “GPU nodes do not behave like a typical storage server or appliance. They reboot frequently for updates, run bleeding-edge kernels and software versions, and are often removed or replaced when faulty.”

The company’s fault-tolerant file system software provides cover for such outages. It assumes hardware will fail and ensures that data integrity and availability are maintained when a node outage occurs. GPU-converged storage is kept available through node reboots and failures.

Quobyte says GPU-converged storage lowers overall costs as it “uses the spare CPU, RAM, and NVMe inside your GPU nodes to lower infrastructure spend and power consumption without adding new hardware.” Each added GPU node contributes storage capacity and throughput automatically. There is no separate storage tier to size, deploy, or scale independently. Kumar says this “aligns with the economics of modern AI: scale is accelerating, but power and space are not.”

Interested parties can request access to Quobyte’s GPU-converged storage here.