1 vs 1 benchmarks won
Anthropic Claude 3.5 Sonnet | Moonshot AI Kimi K2 Thinking | |
|---|---|---|
| Overview | ||
| Company | Anthropic | Moonshot AI |
| Release date | Jun 20 2024 | Nov 6 2025 |
| Model type | — | — |
| Open source | No | Yes |
| Specifications | ||
Parameters | — | 1T |
Context window | — | 256k |
| Benchmarks | ||
Science reasoning GPQA Diamond | 59.4% | — |
Software engineering SWE-Bench Verified | 33.4% | 71.3% |
Multimodal understanding MMMU | — | — |
| Timeline | ||
| Release gap | Claude 3.5 Sonnet shipped 504 days before Kimi K2 Thinking | |
Claude 3.5 Sonnet and Kimi K2 Thinking are evenly matched across the benchmarks they both publish. Claude 3.5 Sonnet shipped 504 days before Kimi K2 Thinking, so benchmark comparisons should account for the intervening progress.
Kimi K2 Thinking is an open-source / open-weight model; Claude 3.5 Sonnet is proprietary.
On SWE-Bench Verified, Kimi K2 Thinking scores 71.3%, 37.9 points above Claude 3.5 Sonnet at 33.4%.