1 vs 1 benchmarks won
Anthropic Claude 3.5 Haiku | Moonshot AI Kimi K2 Thinking | |
|---|---|---|
| Overview | ||
| Company | Anthropic | Moonshot AI |
| Release date | Oct 22 2024 | Nov 6 2025 |
| Model type | — | — |
| Open source | No | Yes |
| Specifications | ||
Parameters | — | 1T |
Context window | — | 256k |
| Benchmarks | ||
Science reasoning GPQA Diamond | 41.6% | — |
Software engineering SWE-Bench Verified | 40.6% | 71.3% |
Multimodal understanding MMMU | — | — |
| Timeline | ||
| Release gap | Claude 3.5 Haiku shipped 380 days before Kimi K2 Thinking | |
Claude 3.5 Haiku and Kimi K2 Thinking are evenly matched across the benchmarks they both publish. Claude 3.5 Haiku shipped 380 days before Kimi K2 Thinking, so benchmark comparisons should account for the intervening progress.
Kimi K2 Thinking is an open-source / open-weight model; Claude 3.5 Haiku is proprietary.
On SWE-Bench Verified, Kimi K2 Thinking scores 71.3%, 30.7 points above Claude 3.5 Haiku at 40.6%.