0 vs 2 benchmarks won
Anthropic Claude 3.5 Haiku | Anthropic Claude Sonnet 4 | |
|---|---|---|
| Overview | ||
| Company | Anthropic | Anthropic |
| Release date | Oct 22 2024 | May 22 2025 |
| Model type | — | — |
| Open source | No | No |
| Specifications | ||
Parameters | — | — |
Context window | — | — |
| Benchmarks | ||
Coding SWE-Bench VerifiedReal coding tasks pulled from open-source projects — the AI has to find and fix actual bugs. A human-checked version of the original SWE-Bench. Higher is better. | 40.6% | 72.7%Best |
Science GPQA DiamondGraduate-level science questions in biology, physics, and chemistry — hard enough that subject-matter PhDs score around 65%. Higher is better. | 41.6% | 75.4%Best |
| Timeline | ||
| Release gap | Claude 3.5 Haiku shipped 212 days before Claude Sonnet 4 | |
Claude Sonnet 4 leads Claude 3.5 Haiku on 2 of the 2 benchmarks they both report (SWE-Bench Verified, GPQA Diamond). Claude 3.5 Haiku shipped 212 days before Claude Sonnet 4, so benchmark comparisons should account for the intervening progress.
Published specifications for these two models are limited — see each model page for the latest details.
On SWE-Bench Verified, Claude Sonnet 4 leads at 72.7% vs Claude 3.5 Haiku at 40.6%. On GPQA Diamond, Claude Sonnet 4 leads at 75.4% vs Claude 3.5 Haiku at 41.6%.