Large language models (LLMs) explained
An in-depth explanation of what large language models are, how they work, how they are trained, and what their limitations and capabilities are.
What is a large language model?
A Large Language Model (LLM) is a type of artificial intelligence trained on enormous amounts of text — think billions of web pages, books, scientific articles, and code. The model learns to recognize patterns in language and can thereby generate text, answer questions, summarize, translate, and reason.
The name 'large' refers to the number of parameters: the internal settings of the model. GPT-4 is estimated to have more than a trillion parameters. The more parameters, the more nuance the model can capture — but also the more computing power and energy is required.
How is an LLM trained?

Illustration created with Canva AI
Training takes place in multiple phases:
- Pre-training — the model learns the structure of language by processing billions of text fragments. The goal is simple: predict the next word. From this simple task, a surprising amount of understanding of grammar, facts, and reasoning patterns emerges.
- Fine-tuning — the base model is adjusted for a specific application, such as answering questions or writing code.
- RLHF (Reinforcement Learning from Human Feedback) — people rate the model's answers. Based on those ratings, the model learns what good answers are: helpful, honest, and safe.
How does an LLM generate text?
An LLM generates text token by token — a token is roughly a word or word part. At each step, the model calculates a probability distribution over all possible next tokens and picks one. This process repeats until the text is complete.
This explains why LLMs sometimes make mistakes: they statistically choose the most probable continuation, not necessarily the most correct one. There is no dictionary or database being consulted — everything is in the weights of the network.
Known models compared
- Claude (Anthropic) — known for long context (up to 200,000 tokens), strong reasoning behavior, and emphasis on safety. Used for complex analysis and coding work.
- GPT-4o (OpenAI) — versatile model combining text, image, and audio. Widely used in consumer products via ChatGPT.
- Gemini (Google DeepMind) — deeply integrated with Google services. Strong in multimodal tasks and searches.
- Llama 3 (Meta) — open-source model that anyone can freely download and modify. Popular for local execution and research.
- Mistral — European open-source alternative, efficient and fast. Suitable for applications where privacy and local processing are important.
What can LLMs do and not do?
Strong at:
- Writing, summarizing, and translating text
- Generating and debugging code
- Answering questions based on provided context
- Reasoning about complex problems in steps
- Structuring disorganized information
Weak at:
- Exact calculations (they do not calculate — they predict)
- Current information (a model has a knowledge cutoff, unless supplemented with search functions)
- Reliable citations (they can make up sources)
- Consistent behavior with subtly altered questions
Hallucinations
A well-known problem of LLMs is hallucinating: the model produces factually incorrect information with great confidence. This happens because the model never learned to say 'I don't know' — it always generates a plausibly sounding answer.
Solutions like RAG (Retrieval-Augmented Generation) link an LLM to a search engine or knowledge base, so answers are based on verified sources rather than solely on training data.
The future of LLMs
Development is rapid. Recent trends are:
- Reasoning models — models that think step by step before answering (such as OpenAI o3 and Claude 3.7 Sonnet)
- Multimodal models — text, image, audio, and video in one model
- Agentic AI — LLMs that independently execute tasks via tools and APIs
- Smaller, more efficient models — suitable for local execution on phone or laptop
Auteur: Claude claude-sonnet-4-6