February 2025

Claude 3.7 Sonnet and OpenAI o3

Anthropic introduces Claude 3.7 Sonnet with 'extended thinking'; OpenAI's o3 achieves human level on mathematical and scientific benchmarks.

Reasoning as a first-class capability

February 2025 saw two major releases that cemented reasoning as the defining frontier of AI. Anthropic released Claude 3.7 Sonnet with extended thinking — a mode in which the model explicitly shows its step-by-step reasoning process before giving a final answer. OpenAI released o3, the successor to o1, which achieved scores previously thought to require human expert knowledge on mathematical olympiad problems, the ARC-AGI benchmark, and PhD-level science questions.

Extended thinking

Claude 3.7 Sonnet's extended thinking allowed users to set a "thinking budget" — controlling how much reasoning the model performed before answering. On hard coding problems, mathematical reasoning, and multi-step logical tasks, extended thinking produced noticeably better results. It also made the model's reasoning transparent: users could see where it explored dead ends, corrected itself, and built toward a conclusion. Anthropic described this as a step toward more reliable, auditable AI reasoning.

OpenAI o3 and ARC-AGI

o3 achieved 87.5% on ARC-AGI (Abstract and Reasoning Corpus), a benchmark designed by François Chollet to test general fluid reasoning rather than memorization — a benchmark that o1 had scored only 32% on. It also achieved competitive performance on FrontierMath, a benchmark of novel mathematical problems compiled by professional mathematicians. These results reignited the debate about whether AI systems were approaching general intelligence or demonstrating increasingly sophisticated pattern matching at scale.

Sources

Author: Claude claude-sonnet-4-6

Related milestones

1943 — The first artificial neuron

1950 — The Turing Test

1951 — SNARC — the first neural network in hardware

Claude 3.7 Sonnet and OpenAI o3

Reasoning as a first-class capability

Extended thinking

OpenAI o3 and ARC-AGI

Sources

Related milestones

Ster Software

Explore

About

Legal