DDN touts Infinia storage as key to faster, cheaper AI inference

Published sat 19 Jul 2025 // 10:37 UTC

DDN has released performance benchmarks showing it can can speed up AI processing time by 27x because of the way it handles intermediate KV caching.

An AI LLM or agent, when being trained on GPUs or doing inference work on GPUs and possibly CPUs, stores existing and freshly computed vectors as key-value items in a memory cache, the KV cache. This can have two memory tiers in a GPU server; the GPUs’ HBM and the CPUs’ DRAM. If more data enters the KVCache, existing data is evicted. If needed later, it has to be recomputed or, if moved out to external storage, such as locally attached SSDs or network-attached storage, retrieved, which can be faster than recomputing the vector. Avoiding KV cache eviction and recomputation of vectors is becoming table stakes for AI training storage vendors, with DDN, Hammerspace, VAST, and WEKA as examples.

BANDF AD

Sven Oehme, CTO at DDN, states: “Every time your AI system recomputes context instead of caching it, you’re paying a GPU tax – wasting cycles that could be accelerating outcomes or serving more users. With DDN Infinia, we’re turning that cost center into a performance advantage.”

Infinia is DDN’s multi-year, ground-up redesigned object storage. It provides sub-millisecond latency, supports more than 100,000 AI calls per second, and is purpose-built for Nvidia’s H100s, GB200s, and Bluefield DPUs. DDN reminds us that Nvidia has said that agentic AI workloads require 100x more compute than traditional models. As context windows expand from 128,000 tokens to over 1 million, the burden on GPU infrastructure skyrockets – unless KV cache strategies are deployed effectively.

The company says that the traditional recompute approach with a 112,000-token task takes 57 seconds of processing time. Tokens are vector precursors, and their counts indicate the scope of an AI processing job. When the same job was run with DDN’s Infinia storage, the processing time dropped to 2.1 seconds, a 27-fold speedup. It says Infinia can cut “input token costs by up to 75 percent. For enterprises running 1,000 concurrent AI inference pipelines, this translates to as much as $80,000 in daily GPU savings – a staggering amount when multiplied across thousands of interactions and 24/7 operations.”

Alex Bouzari, CEO and co-founder of DDN, says: “In AI, speed isn’t just about performance – it’s about economics. DDN enables organizations to operate faster, smarter, and more cost-effectively at every step of the AI pipeline.”

BANDF AD

It is unclear how DDN’s implementation compares to those from Hammerspace, VAST Data, and WEKA, as comparative benchmarks have not been made public. We would suppose that, as KV caching is becoming table stakes, suppliers such as Cloudian, Dell, IBM, HPE, Hitachi Vantara, NetApp, PEAK:AIO, and Pure Storage will add KV cache support using Nvidia’s Dynamo offload engine.

Bootnote

The open source LMCache software also provides KV cache functionality, as does the Infinigen framework.

data management

DDN touts Infinia storage as key to faster, cheaper AI inference

AI server frenzy fuels record revenues for Dell

All-flash array topline boost puts NetApp on track to strongest year yet

Everpure tops $1B quarter as FY 26 revenue hits $3.7B

Nutanix beats the Street on revenue, lands $150M AMD AI alliance

VAST broadens AI platform push with Nvidia tie-up and control plane

Index Engines flags rise of polymorphic, shadow-encrypting ransomware

Commvault plugs AI anomaly alerts into CrowdStrike Falcon SIEM

Scality RING becomes back-end object store for WEKA NeuralMesh

Druva adds Agentic Memory to speed forensic compliance probes

Backblaze lands first eight-figure neocloud deal as revenue climbs 12%

Backblaze brings backend B2 Neo cloud storage to GPU server farms

HDD is back: The return of the hard drive

Storage news ticker – February 23

How AI Is forcing storage back into the enterprise conversation

StorONE arrays adopt external flash JBODs in flash program

Flamethrower from Backblaze to fire up startup cloud storage

Quantum results show green shoots as tape sales double

Platters: WD new disk drive tech hits lucky 14

Court dismisses NetApp complaint against ex-CTO now at VAST, but NetApp is appealing

Suite Studios cuts out proprietary file formats with S3-native streaming

Dell VP says discrete beats disaggregated storage for AI

SK Hynix proposes HBM and HBF hybrid for LLM inference