Scaling RAG with RAGOps and agents

COMMISSIONED: Retrieval-augmented generation (RAG) has become the gold standard for helping businesses refine their large language model (LLM) results with corporate data.

Whereas LLMs are typically trained with public information, RAG enables businesses to augment their LLMs with context or domain specific knowledge from corporate documents about products, processes or policies.

RAG’s demonstrated ability to augment results for corporate generative AI services improves employee and customer satisfaction, thus improving overall performance, according to McKinsey.

Less clear is how to scale RAG across an enterprise, which would enable organizations to turbocharge their GenAI use cases. Early efforts to codify repeatable processes to help spin up new GenAI products and services with RAG have run into limitations that impact performance and relevancy.

Fortunately, near term and medium-term solutions offer possible paths to ensuring that RAG can scale in 2025 and beyond.

RAGOps rising

LLMs that incorporate RAG require access to high-quality training data. However, ensuring the quality and availability of relevant data tends to be challenging because the data is scattered across different departments, systems and formats.

To maximize their effectiveness, LLMs that use RAG also need to be connected to sources from which departments wish to pull data – think customer service platforms, content management systems and HR systems, etc. Such integrations require significant technical expertise, including experience with mapping data and managing APIs.

Also, as RAG models are deployed at scale they can consume significant computational resources and generate large amounts of data. This requires the right infrastructure as well as the experience to deploy it, as well as the ability to manage data it supports across large organizations.

One approach to mainstreaming RAG that has AI experts buzzing is RAGOps, a methodology that helps automate RAG workflows, models and interfaces in a way that ensures consistency while reducing complexity.

RAGOps enables data scientists and engineers to automate data ingestion and model training, as well as inferencing. It also addresses the scalability stumbling block by providing mechanisms for load balancing and distributed computing across the infrastructure stack. Monitoring and analytics are executed throughout every stage of RAG pipelines to help continuously refine and improve models and operations.

McKinsey, for instance, uses RAGOps to help its Lilli GenAI platform sift through 100,000 curated documents. Lilli has answered more than 8 million prompts logged by roughly three-quarters of McKinsey employees searching for tailored insights into operations.

The coming age of agentic RAG

As an operating model for organizations seeking to harness more value from their GenAI implementations, RAGOps promises to land well in organizations that have already exercised other operating frameworks, such as DevOps or MLOps.

Yet some organizations may take a more novel approach that follows the direction the GenAI industry is headed: marrying RAG with agentic AI, which would enable LLMs to adapt to changing contexts and business requirements.

Agents designed to execute digital tasks with minimal human intervention are drawing interest from businesses seeking to delegate more digital operations to software. Some 25 percent of organizations will implement enterprise agents by 2025, growing to 50 percent by 2027, according to Deloitte research.

Agentic AI with RAG will include many approaches and solutions, but many scenarios are likely to share some common traits.

For instance, individual agents will assess and summarize answers to prompts from a single document or even compare answers across multiple documents. Meta agents will orchestrate the process, managing individual agents and integrating outputs to deliver coherent responses.

Ultimately, agents will work within the RAG framework to analyze, plan and reason in multiple steps, learning as they execute tasks and altering their strategies based on new inputs. This will help LLMs better respond to more nuanced prompts over time.

In theory, at least.

The bottom line

The future looks bright for GenAI technologies, which will flow from research labs to corporate AI factories, part of a burgeoning enterprise AI sector.

For example, the footprint of models will shrink even as they become more optimized to run efficiently on-premises and at the edge on AI PCs and other devices. RAG standardization, including software libraries and off-the-shelf tools, will grow.

Whether your organization is embracing RAGOps or adopting agentic AI, solutions are emerging to help organizations scale RAG implementations.

Agentic RAG on the Dell AI Factory with NVIDIA, when applied to healthcare for example, helps reconcile the challenges of utilizing structured data, such as patient schedules and profiles, alongside unstructured data, such as medical notes and imaging files, while maintaining compliance to HIPAA and other requirements.

That’s just one bright option. Many more are emerging to help light the way for organizations in the midst of their GenAI journey.

Learn more about Dell AI Factory with NVIDIA.

Brought to you by Dell Technologies.