Rubrik, the cyber-protector and resilience supplier, will use third-party technology to make a customer’s protected and secured data available to Gen AI agents with its Annapurna project.
The agents will, we’re told, be able to use Rubrik-generated copies of a customer’s proprietary data to build retrieval-augmented generated responses that are intended to be more accurate and relevant to user requests made within the customer’s IT environment.
Rubrik’s Annapurna will provide security-protected data from its Rubrik Security Cloud to large language model (LLM) AI Agents from Amazon’s Bedrock model store. These models need to access source data that has been transformed into vector embeddings stored in a vector database. More information has emerged about how this process will take place.
Chief Product Officer Anneka Gupta told us: “The vectorization will be initialized through Rubrik Security Cloud (RSC), which does not require customers to bring their own embedding model or vector database to facilitate. Through RSC, customers can leverage Amazon Bedrock and Azure OpenAI LLM and embedding models to create and deliver secure data embeddings for Gen AI apps.”
That sorts out how data will be transformed into vectors. These then have to be made available to LLMs responding to user requests.
Gupta added: “Embeddings will be stored in a vector database managed by Rubrik Security Cloud. The database itself may be through a third party provider such as Pinecone or Azure AI search, but the data is held as vectors and not chunk content.”
Azure AI Search, previously called Azure Cognitive Search, is an information retrieval system for mixed type content that has been ingested into a search index. It “is the recommended retrieval system for building RAG-based applications on Azure, with native LLM integrations between Azure OpenAI Service and Azure Machine Learning, and multiple strategies for relevance tuning.”
The indexing process includes vectorization: “Rich indexing … This includes integrated data chunking and vectorization for RAG, lexical analysis for text, and optional applied AI for content extraction and enrichment.”
“Indexing is an intake process that loads content into your search service and makes it searchable. Internally, inbound text is processed into tokens and stored in inverted indexes, and inbound vectors are stored in vector indexes. The document format that Azure AI Search can index is JSON. You can upload JSON documents that you’ve assembled, or use an indexer to retrieve and serialize your data into JSON.”
We noted that Annapurna provides data embeddings generated from data stored in the Rubrik Security Cloud, meaning backed-up data. Could Annapurna also provide access to real-time data, meaning data created and not yet captured by the Rubrik Security Cloud?
Gupta said: “Today, Annapurna will leverage data, metadata and permissions stored in Rubrik Security Cloud, which refresh dynamically. We are not focused on real-time data sources outside of Rubrik Security Cloud at this time, however we are always looking at ways to enhance our innovations.”