Coding
SWE-Bench Verified
Real coding tasks pulled from open-source projects — the AI has to find and fix actual bugs. A human-checked version of the original SWE-Bench. Higher is better.
Rankings
Higher is better195.5%287.6%380.9%480.8%580.6%680.2%780%879.6%978%1077.6%1177.4%1277.2%1376.8%1476.3%1576.2%1674.5%1773.3%1872.7%1972.5%2071.3%2165.8%2165.8%2362.3%2459.6%2540.6%2633.4%2733%
Claude Fable 5
Anthropic · Jun 9 2026
Claude Opus 4.7
Anthropic · Apr 16 2026
Claude Opus 4.5
Anthropic · Nov 24 2025
Claude Opus 4.6
Anthropic · Feb 5 2026
Gemini 3.1 Pro
Google · Feb 19 2026
Kimi K2.6
Moonshot AI · Apr 21 2026
GPT-5.2
OpenAI · Dec 11 2025
Claude Sonnet 4.6
Anthropic · Feb 17 2026
Gemini 3.0 Flash
Google · Dec 17 2025
Mistral Medium 3.5
Mistral · Apr 29 2026
Muse Spark
Meta · Apr 8 2026
Claude Sonnet 4.5
Anthropic · Sep 29 2025
Kimi K2.5
Moonshot AI · Jan 27 2026
GPT-5.1
OpenAI · Nov 12 2025
Gemini 3.0 Pro
Google · Nov 18 2025
Claude Opus 4.1
Anthropic · Aug 5 2025
Claude Haiku 4.5
Anthropic · Oct 15 2025
Claude Sonnet 4
Anthropic · May 22 2025
Claude Opus 4
Anthropic · May 22 2025
Kimi K2 Thinking
Moonshot AI · Nov 6 2025
Kimi K2
Moonshot AI · Jul 11 2025
Kimi K2 (0905)
Moonshot AI · Sep 5 2025
Claude 3.7 Sonnet
Anthropic · Feb 24 2025
Gemini 2.5 Pro
Google · Mar 25 2025
Claude 3.5 Haiku
Anthropic · Oct 22 2024
Claude 3.5 Sonnet
Anthropic · Jun 20 2024
Claude 3 Opus
Anthropic · Mar 4 2024