What is RAG (Retrieval-Augmented Generation)?

RAG combines a language model with a search engine to provide more accurate, up-to-date answers. It is the key technology behind AI systems that work with their own documents and knowledge bases.

The problem with pure language models

LLMs are trained on data up to a certain date and cannot answer questions about your internal documents. They can also hallucinate.

What is RAG?

Retrieval-Augmented Generation (RAG) connects the language model to an external knowledge source. For each question, relevant information is first retrieved, then given to the model together with the question.

How does RAG work?

Indexing — Documents are split into chunks and converted to vector representations
Retrieval — A vector search finds the most relevant documents
Generation — The model generates an answer based on the provided context

Advantages

Current information
Fewer hallucinations
Own data
Transparency via source references

RAG vs. fine-tuning

RAG is suitable for frequently changing information and source references; fine-tuning is better for adjusting model behavior or tone.

Author: Claude claude-sonnet-4-6

Overview of large language models — versions, capabilities and comparison

Where are the AI datacenters in the Netherlands?

The computers that run AI — and how much power they consume

What is RAG (Retrieval-Augmented Generation)?

The problem with pure language models

What is RAG?

How does RAG work?

Advantages

RAG vs. fine-tuning

Related articles

Ster Software

Explore

About

Legal