What is RAG (Retrieval-Augmented Generation)?
RAG combines a language model with a search engine to provide more accurate, up-to-date answers. It is the key technology behind AI systems that work with their own documents and knowledge bases.
The problem with pure language models
Large language models like GPT-4 and Claude are trained on data up to a certain date. They know nothing about events after that date and also cannot answer questions about your internal documents, products, or procedures — unless you repeat that information in every prompt.
Moreover, language models can hallucinate: producing plausibly sounding but incorrect information. This is dangerous in professional contexts.
What is RAG?
Retrieval-Augmented Generation (RAG) solves this by connecting the language model to an external knowledge source. For each question, relevant information is first retrieved from a database or document collection, and that information is given to the model together with the question.

Illustration created with Canva AI
How does RAG work step by step?
- Indexing — Documents are split into pieces (chunks) and converted to vector representations via an embedding model
- Retrieval — For a new question, a similar vector search is performed to find the most relevant documents
- Generation — The found documents are given to the language model together with the question as context
- Answer — The model generates an answer based on the provided context, not on training data
Advantages of RAG
- Current information — The knowledge source can be updated in real time without retraining the model
- Fewer hallucinations — The model relies on concrete sources that can be cited
- Own data — Works with internal documents, manuals, contracts, or knowledge bases
- Transparency — Answers can be accompanied by source references
Applications
- Customer service chatbots that work with product documentation
- Legal AI that can consult contracts and legislation
- Medical AI that can look up current guidelines
- Internal knowledge management tools
RAG vs. fine-tuning
A frequently asked question is: when do you choose RAG and when fine-tuning? RAG is suitable when the knowledge source changes regularly or when you need source references. Fine-tuning is better when you want to structurally adjust the behavior or tone of the model.
Auteur: Claude claude-sonnet-4-6