Veeam lays groundwork for AI agents to vacuum up backup data

Interview: Veeam is developing software to enable large language models (LLMs) and AI agents to trawl through reams of Veeam backup data and analyze it to answer user or agent requests. The work was disclosed by Rick Vanover, Veeam’s VP for Product Strategy in its Office of the CTO, during a briefing in London last week.

Blocks & Files: So here you are with Veeam, top of the data protection heap, and built on VMware to some degree. But now, with VMware switchers emerging as Broadcom changes licensing terms, is Veeam intending to support all the VMware switchers so that you don’t lose any momentum with virtualized applications?

Rick Vanover, Veeam
Rick Vanover

Rick Vanover: Make no mistake, we started as VMware-only. I’m in my 15th year at Veeam and when I started, that’s what it was. It was only VMware backup in 2011. We added Hyper-V in 2014, we added physical 2016, we added Microsoft 365 and it just runs and runs. Recently we added Proxmox and we just announced Scale Computing. We just announced HPE VM Essentials. So we’re expanding the hypervisor footprint, but are we going to help [people] move? Are we an evacuation tool? Absolutely not. We’ve got another one planned, but we haven’t announced it yet. So we’re not done.

Blocks & Files: When considering virtualized server platforms, do you lean more toward enterprise adoption than small business?

Rick Vanover: It’s demand. That’s very accurate. But I will say we look at product telemetry, the percentages of who’s using what. The biggest benefactor [of VMware migration] has been Hyper-V. That’s the biggest. That makes sense. One that we’ve seen grow, but we’re definitely looking at this to support the platforms as they grow. And we have an opportunity to have parity across them. VMware is by far the most capable stack, then Hyper V, then Nutanix, and then it goes way down real quick.

Blocks & Files: Suppliers at the low end, like VergeIO, put themselves in the VMware switch target market. I imagine that, at the moment, they’re probably too small scale for you to bother with.

Rick Vanover: The way we approach it is not just market share, but we also have to look at what customers and partners are asking for, and is the money going to work out, and concurrently we’re going to assess the QA burden, the development cost to get there. And what we announced with HPE VM Essentials, those analyses were done and it was worth going into. So nothing to share with VergeIO other than we’re very aware.

Blocks & Files: The collected Veeam backed-up data, for a large customer especially, is huge. AI needs access to proprietary data for RAG (retrieval-augmented generation). And if the people implementing RAG in an enterprise can say, let’s point it at one place, go to the one place where everything is. Like the backups. The question is: are you doing anything to help enterprises source RAG content from your backups?

Rick Vanover: Yes. Do we have anything fully productized and available to do that today? No, but what I would say for you on that is, we have the plumbing to do that. In fact, there’s this blur between something that’s fully productized and then something you can do with a product. [Something] semi-productized, semi-supported. We have something like that now with what we are calling Retro Hunter, which is a way to scan through all the backups.

It’s not AI, but it’s an example of looking at the data under management by Veeam. Now that use case is for threats. It’s not yet at that level of building data insights in the AI RAG world. We announced Veeam Guardian at VeeamOn, and we did some showcasing of our Model Context Protocol work. But it’s not out yet. There’s several other things in the works, make no mistake. That is absolutely where we are going.

Blocks & Files: With that in mind, if we look at an AI stack as a data source at the bottom, and AI agents or large language models at the top, they’ve got to go look in a vector database for the vector embedding. So there has to be some kind of AI pipeline connecting the source through some ETL procedures to produce vectors and put them in a database so agents can look at them. Now, vectorizing the data has to be done as well. So, the Veeam backup repository, if its contents are going to be made available to an AI data pipeline, where does Veeam’s interest stop? And where does the AI data pipeline involve other suppliers?

Rick Vanover: Let me highlight the vector conversation, the vector database point. That is actually something that we’ve been protecting already. In fact, at KubeCon last year, we even had a session of how some of our engineers showed how you can protect that with our Kanister product to actually protect the vector data.

Then if we go down to the data layer, we can be that data. And we’ve had a number of storage partners actually start to work at taking copies and moving it around to feed other AI solutions. And at that point, we may be part of the data stack, but then the rest of the application is homegrown or otherwise. But make no mistake, there will be a time that we have AI solutions that are going to go the full distance there for organizations.

That acquisition we did in September, Alcion, that is a team that’s absolutely working on that very deliverable that you speak of. We showcased part of it at VeeamOn with what we’re going to be doing with Anthropic with the Model Context Protocol for data under management by Veeam.

Blocks & Files: You would have some kind of software layer above the data under management and it will support MCP so that an agent could come along and say, give me the data relating to this department over a period of time?

Rick Vanover: Exactly. That would be done in Veeam Data Cloud, our backup storage-as-a-service. And in the example we showed at VeeamOn, we are doing it from Microsoft 365 data. We used Model Context Protocol to crawl through years of data to answer questions about the business or the scenario. And I know how to use Veeam, but the example that was shown there within three minutes would’ve taken me a week to produce. And that’s the result that people want to talk to their data.

That’s the goal and we showed a piece of that. We’re on a journey. We are working on it, [and] we gave a preview.