1 vs 1 benchmarks won
Anthropic Claude Sonnet 4.5 | Google Gemini 2.5 Pro | |
|---|---|---|
| Overview | ||
| Company | Anthropic | |
| Release date | Sep 29 2025 | Mar 25 2025 |
| Model type | — | — |
| Open source | No | No |
| Specifications | ||
Parameters | — | — |
Context window | — | — |
| Benchmarks | ||
Science reasoning GPQA Diamond | 83.4% | 86.4% |
Software engineering SWE-Bench Verified | 77.2% | 59.6% |
Multimodal understanding MMMU | 68% | 68% |
| Timeline | ||
| Release gap | Gemini 2.5 Pro shipped 188 days before Claude Sonnet 4.5 | |
Claude Sonnet 4.5 and Gemini 2.5 Pro are evenly matched across the benchmarks they both publish. Gemini 2.5 Pro shipped 188 days before Claude Sonnet 4.5, so benchmark comparisons should account for the intervening progress.
Published specifications for these two models are limited — see each model page for the latest details.
On GPQA Diamond, Gemini 2.5 Pro scores 86.4%, 3 points above Claude Sonnet 4.5 at 83.4%. On SWE-Bench Verified, Claude Sonnet 4.5 scores 77.2%, 17.6 points above Gemini 2.5 Pro at 59.6%. On MMMU, both models score 68%.