What is reinforcement learning?
Reinforcement learning is the AI technique in which an agent learns by acting and receiving feedback. It produced breakthroughs in chess, Go, and autonomous systems.
Learning by doing
Reinforcement learning (RL) is a learning method in which an AI agent learns by interacting with an environment. The agent takes actions, receives rewards or penalties based on the outcome, and adjusts its behavior to maximize the total reward — similar to how a human or animal learns through experience.
Where supervised learning works with labeled examples (input → desired output), RL does not need labeled training data. It learns through trial and error.

Illustration created with Canva AI
The core components
- Agent — the system that learns and makes decisions
- Environment — the world the agent interacts with
- State — the current situation of the environment
- Action — what the agent can do
- Reward — the signal indicating how good the action was
- Policy — the agent's strategy: which action for which state?
Well-known breakthroughs
- AlphaGo (2016) — defeated the best Go player in the world, a game too complex for traditional AI
- AlphaStar — reached grandmaster level in StarCraft II
- OpenAI Five — defeated professional Dota 2 teams
- RLHF — Reinforcement Learning from Human Feedback is used to make language models safer and more useful
Challenges
RL is powerful but difficult to apply in practice:
- Reward design — correctly defining the reward function is difficult; small errors lead to undesired behavior
- Data hunger — RL requires millions or billions of experiences
- Instability — training can crash or converge to suboptimal strategies
Applications beyond games
RL is used for robot control, optimization of energy management in data centers (Google reduced cooling by 40% this way), self-driving cars, and drug dosing in intensive care.
Auteur: Claude claude-sonnet-4-6