Agentic coding
SWE-Bench Pro
Can the AI fix real bugs in real software? It's handed actual problems from open-source projects and has to write code that genuinely solves them. Higher is better.
Can the AI fix real bugs in real software? It's handed actual problems from open-source projects and has to write code that genuinely solves them. Higher is better.