0 vs 3 benchmarks won
Anthropic Claude Sonnet 4.5 | Meta Muse Spark | |
|---|---|---|
| Overview | ||
| Company | Anthropic | Meta |
| Release date | Sep 29 2025 | Apr 8 2026 |
| Model type | — | — |
| Open source | No | No |
| Specifications | ||
Parameters | — | — |
Context window | — | — |
| Benchmarks | ||
Coding SWE-Bench VerifiedReal coding tasks pulled from open-source projects — the AI has to find and fix actual bugs. A human-checked version of the original SWE-Bench. Higher is better. | 77.2% | 77.4%Best |
Science GPQA DiamondGraduate-level science questions in biology, physics, and chemistry — hard enough that subject-matter PhDs score around 65%. Higher is better. | 83.4% | 89.5%Best |
Multimodal MMMUTests the AI on understanding images and text together across many college subjects. Higher is better. | 68% | 80.4%Best |
| Timeline | ||
| Release gap | Claude Sonnet 4.5 shipped 191 days before Muse Spark | |
Muse Spark leads Claude Sonnet 4.5 on 3 of the 3 benchmarks they both report (SWE-Bench Verified, GPQA Diamond, MMMU). Claude Sonnet 4.5 shipped 191 days before Muse Spark, so benchmark comparisons should account for the intervening progress.
Published specifications for these two models are limited — see each model page for the latest details.
On SWE-Bench Verified, Muse Spark leads at 77.4% vs Claude Sonnet 4.5 at 77.2%. On GPQA Diamond, Muse Spark leads at 89.5% vs Claude Sonnet 4.5 at 83.4%. On MMMU, Muse Spark leads at 80.4% vs Claude Sonnet 4.5 at 68%.