NetApp claims ‘incumbency is the new cool’ in the AI era

Analysis: NetApp reckons that GenAI inferencing is going to be in widespread use across enterprises and needs to access swathes of connected data – exactly what its intelligent data infrastructure offering presents.

AI inferencing will thrive as data silos are torn down and their contents made accessible to GenAI large language models (LLMs). Bearing in mind that modern AI involves training, with data sent at high speed to expensive and powerful GPUs, to produce generally skilled LLMs, and inferencing in which the LLMs respond to user requests, how should storage be optimized for AI?

AI training can take place on-premises in the public cloud, as can inferencing, and inferencing has an additional need, which is access to proprietary data with retrieval-augmented generation (RAG) so that a generally skilled LLM can become a specifically skilled LLM and fill in gaps in its knowledge base. These gaps can cause malformed responses to user requests, commonly referred to as “hallucinations.”

Russell Fishman, NetApp’s senior director of product management for AI and allied topics, thinks connected data for inferencing is squarely in NetApp’s swim lane. He thinks that GPU server racks are becoming extraordinarily demanding of electrical power and capex dollars, and this is going to restrict AI training locations.

Fishman cited Nvidia’s Blackwell GPUs with their 120 kilowatts per rack need, saying: “I don’t know any enterprises that have a datacenter rack capable of running a 120 kilowatt rack. Not even close. The reality here is that AI training is going to become the reserve of a few very specialized providers. You’re going to rent the training.”

Amplifying this, he said: “There are absolutely some customers who will not rent training. And I’m guessing that that number will never go above double digits,” meaning less than ten worldwide.

Our tracking of Nvidia GPU Direct certifications shows that specialized high-performance parallel file systems are needed to keep pumping data at fire hose speeds to Nvidia GPU racks – think Lustre and Storage Scale, the renamed GPFS, as used for HPC and supercomputing

AI training is becoming a specialized niche, whereas AI Inferencing is not. Fishman said: ”What’s really happened now is that AI is moving out of the development phase, it’s moving into the value phase.”

He’s seeing democratization with “the mass market of AI.” But ”now we have a data problem. We have data over here, we have AI data that we want to go training over there. And then eventually, where do those generative models sit? The answer is everywhere. They sit at the edge, they sit at the core, they could sit at the branch, they can sit everywhere, right? And so that’s now a problem.”

“How do I shepherd data and make it easy so that it’s seamless; so that, as a data scientist, and a data engineer, if I need to move data from here to there, I do it in a way that is efficient, fast, and most importantly, remains within the confines of my corporate data governance policies?”

CIOs and similar execs need to think differently, as Fishman indicates: “Suddenly, we need to now move into an ‘I need to actually run this thing’ phase. And I’m delivering a user-facing service that has expectations around uptime, and that sort of stuff.”

This means that general concerns arise about manageability, observability and how data is being used and managed throughout the AI lifecycle. Fishman says: “When you add all those things, then suddenly people are looking through this and thinking ‘OK, do I want all these different data silos?’” 

Data silos, built to help specific data access and storage needs, impede more general data access. The more inside data an organization’s LLM has available through RAG, the better its responses. Taking a step back, Fishman says: “The point is, data and storage is becoming more of a general concern for AI as we look across companies. So what what actually needs to change?”

This is where NetApp has an advantage with its pioneering data fabric concepts – an “intelligent data infrastructure” – and the ability for its customers to move and access data in the NetApp ONTAP data estate, with OEM support, connections to virtually any server you can name, running in the public clouds, and tens of thousands of customer implementations.

“Incumbency,” Fishman says, “is the new cool.”

So we have this NetApp intelligent data infrastructure platform, and storage of hundreds of exabytes of customer information data in its data estate. And that stored data needs to be turned into embedded vectors for an LLM to search it.

I ask Fishman, would NetApp see its responsibility, its interest, to provide any AI-related applications on top of that, system-level applications? When your customers have such a massive amount of information in NetApp’s infrastructure, would you want to be involved in providing processing to them? To vectorize their installed data estate?

He replies: “I’m not making any announcements about products that are in production or anything else. But I’ll give you my vision. [There are] some things, some elements of what you described, which are intrinsically better delivered from a storage or data management platform.”

“I can think of probably two issues or problems that vector databases have to solve which are tricky and better done in storage. Firstly, the vectorization, the vector embedding, for generating the vector values, according to whatever vector model you want to use.”

“That’s compute-intensive, but more importantly, it’s a very data-intensive function where you’re having to pull the data out. And of course, when I say data, I’m not just talking about files, I’m talking about entities. That could be a slide in a deck or a paragraph in a document, or it could be a column in a table. That is something that you could make a very good case for saying it’s better to do it as close to the data as you can.”

And the second issue?

“Vector databases spend a lot of time and effort trying to work out what’s changed. And some of them do it in a very basic way. They look at the file sizes, they look at last edited date, that sort of stuff. And that becomes a real problem, because, at the entity level, I don’t want to have to re-vectorize all the content, I just want to go down very precisely to the piece that’s changed. I might even want to do it in line. So just think about from a data path perspective, you do it in line rather than going off line.”

For other GenAI-related processing that sits further up the stack, “we absolutely need to connect to that through a set of open API’s.” An API similar to Kubernetes Container Storage Interface (CSI), which NetApp donated to the open source community and “has become the standard for all persistent storage for Kubernetes.”

Fishman says: ”We certainly see our role in doing something very similar in establishing those t items.”

Data governance is another concern in GenAI inferencing. “Data governance, is an interesting topic, because I think most people outside the industry would see data governance as something that holds back data scientists and data engineers from doing their job. But actually, if you go and talk to data scientists and data engineers, it doesn’t hold them back. It actually scares the hell out of them. It scares them that they are going to do something with data, which puts the company that pays their wages at risk.”

“There’s legal or regulatory concerns here, which are absolutely real, and then there are commercial concerns.”

“There’s a lot of focus on building guardrails for how data is used and managed to further up the stack. But I think about that a little bit like playing Whac-A-Mole, right? I now have to go and build my policies and implement them and all these other tools. What we believe is that’s actually better done at the storage layer.”

“NetApp has a lot of IP in this space. We have this thing called Blue XP data classification, which really understands what’s in the data. There are other types of metadata you’d want to generate, like data lineage, data access controls; there’s a whole bunch of metadata that you’d use to enrich the data entities. And then on top of that, of course, you would have this idea of how that data is made accessible to a user.”

“I think that we as an industry, we’ve been going after this the wrong way, where we have the concept of Whac-A-Mole. Further up the stack is actually not the right way to do it. You want to control data where the data lives, not just where the data is being used.”

“That’s the vision here. That’s why they’re so excited about the concept of intelligent data infrastructure, because we believe that is how we are going to deliver that next level of capability in this new era of AI.”

Fishman thinks: “Generative AI is an engine that feeds on the fuel, that is data. That data is typically latent data that exists in enterprises … Our customers are looking to people like NetApp to help them unlock it. This is where incumbency starts to become very cool.”

”We’re the kings and queens of unstructured data. We have been for a number of years … And that’s what’s driving this new wave of generative AI. So we’ve seen a huge opportunity to help our not just our customers, but other organisations that want to take advantage of generative AI. We think we are the way to make that easy, simple, trustworthy, right? That these are the things that we really believe we can bring to the table.”

To conclude, Fishman says NetApp’s intelligent data infrastructure embodies how storage should be optimized for AI.