1 vs 1 benchmarks won
Anthropic Claude Opus 4.5 | Moonshot AI Kimi K2.5 | |
|---|---|---|
| Overview | ||
| Company | Anthropic | Moonshot AI |
| Release date | Nov 24 2025 | Jan 27 2026 |
| Model type | — | — |
| Open source | No | Yes |
| Specifications | ||
Parameters | — | 1T |
Context window | — | 256k |
| Benchmarks | ||
Science reasoning GPQA Diamond | 87% | 87.6% |
Software engineering SWE-Bench Verified | 80.9% | 76.8% |
Multimodal understanding MMMU | — | — |
| Timeline | ||
| Release gap | Claude Opus 4.5 shipped 64 days before Kimi K2.5 | |
Claude Opus 4.5 and Kimi K2.5 are evenly matched across the benchmarks they both publish. Claude Opus 4.5 shipped 64 days before Kimi K2.5, so benchmark comparisons should account for the intervening progress.
Kimi K2.5 is an open-source / open-weight model; Claude Opus 4.5 is proprietary.
On GPQA Diamond, Kimi K2.5 scores 87.6%, 0.6 points above Claude Opus 4.5 at 87%. On SWE-Bench Verified, Claude Opus 4.5 scores 80.9%, 4.1 points above Kimi K2.5 at 76.8%.