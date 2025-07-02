Interview: In a briefing with Cloudian founder and CEO Michael Tso, we explored the data storage demands of AI and learned that inference could require vast amounts of context data to be stored online and that compute will have to come to this data. This conversation has been edited for brevity and flow.

Blocks & Files: Do you think that AI is going to become really, really important? It’s not just a bubble?

Michael Tso: I think it’s going to be world-changing. I don’t want to exaggerate, but I think this is one of the things where it’s a kind of James Watt-type moment.

Michael Tso

I’m not sure where that leaves humanity because we’re basically automating ourselves right now. We’re automating all the work that we normally do and I think it is happening at the pace that the soft tissue, the biology, is not able to adapt. So I think that’s a problem. I think we are going to have to figure out very quickly what are we going to do with ourselves.

Blocks & Files: Are you using some of the advanced AI inside Cloudian, for instance, for doing some programming work?

Michael Tso: It’s always supervised. We’re having AI write some of the code and it’s very good at analyzing code and fixing things and telling you why it’s not working. We were just doing some testing on some GPU server, and we were stuck there. It looked like some kind of hardware setup wasn’t right. So we just asked the AI, what should be the right kind of file setup? Normally stuff like this would take ages because it’s a new box, you’re not going to find anything on it. But it gave us a bunch of suggestions. We tried it and we got beyond where we were stuck on and things got much faster. It’s like any tool, we need to learn how to use it to our advantage.

Blocks & Files: How do you see the future here? Will Cloudian carry on providing a fast, reliable storage layer to feed AI and other applications?

Michael Tso: I believe compute’s going to come to the data. I believe that data gets so big, compute’s going to have to come to the data.

I think what we have right now in hyperconverged is one end of the spectrum where the compute is using small amounts of data and you would suck the data into the compute and it’s a lot faster. Cloudian all these years has been working on the opposite end of the spectrum, with the idea that data gets so heavily gravitational that it’s going to attract the compute. Nvidia actually shares this vision.

Where we are going right now is we’re building Cloudian into a full-fledged data processing platform. We’re no longer a storage-only platform. The idea is that we’ll take data in and we will turn it into different formats that can be easily consumed and can be easily processed by different AI tools. Think of it as, normally you would just take documents from this company and we will be storing it and we’ll put a legal hold, all that stuff, right on it. But now when it comes in, we will vectorize data. We’ll put in a vectorized database.

The way we see Cloudian is, Cloudian is a true platform in a sense that you can ingest data and you can plug in these modules that will process the data and these modules can then create their own data. We’ve built prototypes of this very early on. We were interested in video. We could do automatic tagging on a video.

That’s one example. But imagine now doing this at a very big scale on any kind of data, any kind of plug-in. And the plug-ins now, what we are working on right now are the Nvidia inferencing microservices, the NeMO retriever and NIM.

What we are working on now is the inferencing pipeline. You take a user question and you first add all the context to the question. Then you take the context and pass it to your AI model. First to your local model that has all your enterprise knowledge. Then to the global model that has knowledge of the universe. You combine these and then you go back and check if this is the answer they’re looking for. And if it’s not right, then you’ve got to fix something and go back that again. And that’s the inferencing pipeline.

And at the end you come out with an answer and then you go back again. So storage is incredibly important in this pipeline because, one, that whole “my data” concept is key. That’s always going to stay inside the enterprise. That’s one. Two, the other one that people miss is the user data. So, in order for me to know what you are asking, I actually have to load all the past history conversations that I had with you.

Blocks & Files: You have to have context for the user to enable you to interpret the question?

Michael Tso: That’s exactly right. [With the AI tools] in the beginning it was a Q&A thing. It did not remember anything. But now it remembers everything about you. You might have heard things called LM cache and KV cache. And what these things do is that they’re basically caching your previous conversations. Essentially it’s caching the token input and the token output. So it doesn’t have to recalculate that again. And it’s caching that in a vectorized way with a very fast search.

Blocks & Files: So in a particular AI question response session you need the tokens for that response session when the response is first made, and you need the tokens for that user’s context. Which means you have to store them.

Michael Tso: That’s right – and you have to store that forever, for billions of people.

Blocks & Files: On what? Tape?

Michael Tso: It cannot be on tape. This is online. It has to be online. I think it’s going to be a tiered solution. I think it’s going to be difficult to store all our lives on NAND.

Blocks & Files: You’re talking exabytes.

Michael Tso: Yes, exactly. When Nvidia first came to Cloudian and said we want to work with you. At that time I was like, “Hey, why do you need to work with us? You seem quite happy with DDN and VAST?” They said: “Well, we need you guys for training because data is getting bigger, but more importantly, we need you guys for inferencing” and “inferencing needs a lot of storage.”

I’m like, really? The models are kind of small, aren’t they? So I didn’t understand that at the time, [but then] I realized, oh my God, who wants to talk to an AI that every time I need to tell it, OK, I’m a 50-something-year-old male. And it’s stupid, right? The great thing about having an assistant, someone who’s worked for you for 20 years, they know everything about you. So you ask for something, they give you the right answer. So that’s what we expect our AI to be. We expect them to remember everything about us forever. Everything we told it – it should know, right? So if you think of storing this, I mean, this is immense. This is an immense amount of storage.

Blocks & Files: You must, I’m thinking, have talked to customers, potential customers about this, and they must have sat across the table from you. What did they say?

Michael Tso: Everybody’s reaction is: “Wow, we never thought about it like this.” This is real. This is exactly everybody’s reaction.

Basically, Nvidia, a year and a half ago, realized that the money in AI is initially all in training. But eventually it’s in inferencing because everybody’s going to have to do it. And for the money to be made in inferencing, you actually have a storage problem. It’s not actually only a compute problem. They have a storage problem and they need to have a distributed big scale storage.

And what does that mean? Distributed big scale storage at reasonable cost. That’s object storage. So that was why they came talking to us. You are going to want something that’s tiered, that allows you to start small and grow.

We come from the enterprise. So we look at it and say, oh, that’s interesting. So what does that mean for us? It means that the architecture we already have, which is a peer-to-peer distributed architecture that can spread around the world. That is perfect. Right? We already have a way to plug in and do compute on it. Really perfect. What we have to do now is that we actually have to integrate way more compute in the platform sooner than we thought we were going to have to.

Do you put GPUs in every Cloudian storage node that right now only has CPUs and NICs and storage, right? Do you put the GPUs in these nodes?

There is another way. In our storage cluster, you build a compute cluster. We traditionally believe that you want to compute as close as practical to the data. So we will end up having what we do now, which is you’re doing some compute that’s on the data on the nodes, and then the heavy stuff we’re going to be doing on our internal compute cluster.

Because that way we can size that out easily. And that way we can also add that on later. Because a customer who buys a Cloudian system may not on day one know that they’re going to be running a vector database. They may not want to do that, but somewhere down the road, they’ll want to do this inferencing.

Blocks & Files: You could implement this in the cloud?

Michael Tso: Yes.

Blocks & Files: And your vision is?

Michael Tso: Our vision is that a customer can choose whatever inferencing modules they want on top of our platform. So we will provide some, but they can provide others; it’s an open platform. So we will move on from your long-term data storage archive platform. We’re already doing a lot of microservices and modern apps. So we’re already in a tier one storage. But then when we get into this interesting AI world, we’re now part application software. But that makes sense because, essentially, the infrastructure is upleveling. If you are only doing storage a couple years from now, you’re not going to be competitive.