0 vs 2 benchmarks won
Anthropic Claude 3.5 Sonnet | Anthropic Claude 3.7 Sonnet | |
|---|---|---|
| Overview | ||
| Company | Anthropic | Anthropic |
| Release date | Jun 20 2024 | Feb 24 2025 |
| Model type | — | — |
| Open source | No | No |
| Specifications | ||
Parameters | — | — |
Context window | — | — |
| Benchmarks | ||
Coding SWE-Bench VerifiedReal coding tasks pulled from open-source projects — the AI has to find and fix actual bugs. A human-checked version of the original SWE-Bench. Higher is better. | 33.4% | 62.3%Best |
Science GPQA DiamondGraduate-level science questions in biology, physics, and chemistry — hard enough that subject-matter PhDs score around 65%. Higher is better. | 59.4% | 68%Best |
| Timeline | ||
| Release gap | Claude 3.5 Sonnet shipped 249 days before Claude 3.7 Sonnet | |
Claude 3.7 Sonnet leads Claude 3.5 Sonnet on 2 of the 2 benchmarks they both report (SWE-Bench Verified, GPQA Diamond). Claude 3.5 Sonnet shipped 249 days before Claude 3.7 Sonnet, so benchmark comparisons should account for the intervening progress.
Published specifications for these two models are limited — see each model page for the latest details.
On SWE-Bench Verified, Claude 3.7 Sonnet leads at 62.3% vs Claude 3.5 Sonnet at 33.4%. On GPQA Diamond, Claude 3.7 Sonnet leads at 68% vs Claude 3.5 Sonnet at 59.4%.