RAG (Retrieval Augmented Generation) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources. An Nvidia blog says RAG: ” fills a gap in how LLMs (Large Language Models) work. Under the hood, LLMs are neural networks, typically measured by how many parameters they contain. An LLM’s parameters essentially represent the general patterns of how humans use words to form sentences. .. [This][ makes LLMs useful in responding to general prompts … However, it does not serve users who want a deeper dive into a current or more specific topic.”

RAG is used to link generative AI services to external resources, and: “the technique can help models clear up ambiguity in a user query. It also reduces the possibility a model will make a wrong guess, a phenomenon sometimes called hallucination.”