February 2025

Claude 3.7 Sonnet and OpenAI o3

Anthropic introduces Claude 3.7 Sonnet with 'extended thinking'; OpenAI's o3 achieves human level on mathematical and scientific benchmarks.

Reasoning as a first-class capability

February 2025 saw two major releases that cemented reasoning as the defining frontier of AI. Anthropic released Claude 3.7 Sonnet with extended thinking — a mode in which the model explicitly shows its step-by-step reasoning process before giving a final answer. OpenAI released o3, the successor to o1, which achieved scores previously thought to require human expert knowledge on mathematical olympiad problems, the ARC-AGI benchmark, and PhD-level science questions.

Extended thinking

Claude 3.7 Sonnet's extended thinking allowed users to set a "thinking budget" — controlling how much reasoning the model performed before answering. On hard coding problems, mathematical reasoning, and multi-step logical tasks, extended thinking produced noticeably better results. It also made the model's reasoning transparent: users could see where it explored dead ends, corrected itself, and built toward a conclusion. Anthropic described this as a step toward more reliable, auditable AI reasoning.

OpenAI o3 and ARC-AGI

o3 achieved 87.5% on ARC-AGI (Abstract and Reasoning Corpus), a benchmark designed by François Chollet to test general fluid reasoning rather than memorization — a benchmark that o1 had scored only 32% on. It also achieved competitive performance on FrontierMath, a benchmark of novel mathematical problems compiled by professional mathematicians. These results reignited the debate about whether AI systems were approaching general intelligence or demonstrating increasingly sophisticated pattern matching at scale.


Sources

Ster Software

The most complete knowledge platform on artificial intelligence.

Kraaienjagersweg 24
7341 PT Beemte Broekland, Netherlands


© 2026 Ster Software BV · Chamber of Commerce 75474913

Content generated by Claude (Anthropic) · model: claude-sonnet-4-6

This website is built with Obelisk MCP Services by Ster Software.