AI data pipelines matter more than models

Published tue 9 Dec 2025 // 13:50 UTC

Scality, the object storage supplier, believes that the idea of competing businesses differentiating themselves by the particular AI foundation model they use is mistaken.

Model features will converge. What matters more is getting the right data to the models, and agents, so that they generate accurate and up-to-date responses.

BANDF AD

Scality CTO Giorgio Regni tells us “As foundation models become broadly available and increasingly interchangeable, they stop being the differentiator. What matters more now (and will matter even more going forward) is the pipeline: how you collect, shape, govern, and deliver data to those models in a way that’s fast, efficient, and reliable at scale.”

It’s not enough to be a bare object storage supplier. You have to be hooked to AI data pipelines to aid the scanning, filtering, and selection of data. In August, we learnt Scality’s RING object storage can be integrated with a vector database and LangChain framework to feed data to RAG workflows for AI models like GPT.

Regni thinks that there is a change of focus happening, from models to pipelines: “We’re seeing this shift everywhere. The real advantage lies in the systems that manage the full lifecycle of enterprise data — not just moving it around, but versioning it, enriching it, and keeping full context and control. That’s the part most organizations are still struggling with. And it’s why we believe pipelines, not models, are quickly becoming the true competitive edge.”

He sees the storage media tiering model consolidating to two basic layers: “From our point of view, this shift is architectural, too. The old five-tier pyramid is breaking down. What we see working in real deployments, especially at scale, is a collapse to two tiers: fast local flash on the GPU servers, and object storage for everything else. That’s it. Flash gives you the bandwidth and latency to keep the GPUs busy. Object gives you the scale, durability, and metadata to store and govern everything that’s not actively in use.”

BANDF AD

Scality RING seen as an AI data pipeline bulk data and stage store.

The backend object storage is an upstack data feeder, and downstack data receiver, and not the top level: “Your pipeline needs to understand and operate on that model. The flash becomes your dynamic working set, constantly hydrated with new data and drained for checkpoints, snapshots, and derivatives. Any extra hops, extra tiers, or added complexity? That’s overhead. That’s latency. That’s where GPUs go idle and budgets start bleeding.”

In his view: “The cloud hyperscalers figured this out a while ago. They’re all converging on this simplified stack. And they’re building pipelines that make full use of it, pipelines that move fast, scale wide, and don’t break governance in the process. That’s what the leaders are doing. That’s where the edge is now.”

The corollary of this is that enterprises and other organizations developing their own AI data pipelines should do the same. Follow the hyperscaler 2-level stack model; flash for 2-way data sprints to and from GPUs, and object storage for the long-term data storage marathon. Scality’s RING XP product uses a GPU server’s local storage drives as the fast access tier, with microsecond-class latencies, on top of bulk object storage.

Two points; Regni sees no major role for file storage here. But then he speaks for an object storage supplier which has added file protocol support; NFS and SMB, on top of object. Secondly, the object layer could be tiered inside, with, for example, a public cloud S3 Glacier-type backend for old and largely in-active objects.

scality data pipeline flash object public cloud rag ai-ml

AI data pipelines matter more than models

AI server frenzy fuels record revenues for Dell

All-flash array topline boost puts NetApp on track to strongest year yet

Everpure tops $1B quarter as FY 26 revenue hits $3.7B

Nutanix beats the Street on revenue, lands $150M AMD AI alliance

VAST broadens AI platform push with Nvidia tie-up and control plane

Index Engines flags rise of polymorphic, shadow-encrypting ransomware

Commvault plugs AI anomaly alerts into CrowdStrike Falcon SIEM

Scality RING becomes back-end object store for WEKA NeuralMesh

Druva adds Agentic Memory to speed forensic compliance probes

Backblaze lands first eight-figure neocloud deal as revenue climbs 12%

Backblaze brings backend B2 Neo cloud storage to GPU server farms

HDD is back: The return of the hard drive

Storage news ticker – February 23

How AI Is forcing storage back into the enterprise conversation

StorONE arrays adopt external flash JBODs in flash program

Flamethrower from Backblaze to fire up startup cloud storage

Quantum results show green shoots as tape sales double

Platters: WD new disk drive tech hits lucky 14

Court dismisses NetApp complaint against ex-CTO now at VAST, but NetApp is appealing

Suite Studios cuts out proprietary file formats with S3-native streaming

Dell VP says discrete beats disaggregated storage for AI

SK Hynix proposes HBM and HBF hybrid for LLM inference