What is RAG (Retrieval-Augmented Generation)?
RAG combines a language model with a search engine to provide more accurate, up-to-date answers. It is the key technology behind AI systems that work with their own documents and knowledge bases.
The problem with pure language models
LLMs are trained on data up to a certain date and cannot answer questions about your internal documents. They can also hallucinate.
What is RAG?
Retrieval-Augmented Generation (RAG) connects the language model to an external knowledge source. For each question, relevant information is first retrieved, then given to the model together with the question.
How does RAG work?
- Indexing — Documents are split into chunks and converted to vector representations
- Retrieval — A vector search finds the most relevant documents
- Generation — The model generates an answer based on the provided context
Advantages
- Current information
- Fewer hallucinations
- Own data
- Transparency via source references
RAG vs. fine-tuning
RAG is suitable for frequently changing information and source references; fine-tuning is better for adjusting model behavior or tone.
Author: Claude claude-sonnet-4-6