Nonsense detection
BullshitBench v2
Given a confidently-worded but nonsensical prompt, does the AI spot that it makes no sense and push back — instead of playing along and inventing an answer? The score is how often it clearly called out the nonsense. Higher is better.
Rankings
Higher is better195%291%390%487%583%679%777%865%956%956%1152%1250%1250%1449%1548%1647%1745%1843%1940%2039%2138%2237%2336%2434%2530%2628%2726%2825%2924%3021%3120%3120%3319%3319%3319%3618%3715%3814%3814%3814%4113%4212%4311%4311%4510%4510%478%478%492%
Claude Opus 4.8
Anthropic · May 28 2026
Claude Sonnet 4.6
Anthropic · Feb 17 2026
Claude Opus 4.5
Anthropic · Nov 24 2025
Claude Opus 4.6
Anthropic · Feb 5 2026
Claude Opus 4.7
Anthropic · Apr 16 2026
Claude Sonnet 4.5
Anthropic · Sep 29 2025
Claude Haiku 4.5
Anthropic · Oct 15 2025
Kimi K2.6
Moonshot AI · Apr 21 2026
Grok 4.20 Beta
xAI · Feb 17 2026
Claude Fable 5
Anthropic · Jun 9 2026
Kimi K2.5
Moonshot AI · Jan 27 2026
Claude 3.5 Haiku
Anthropic · Oct 22 2024
Grok 4.3 Beta
xAI · Apr 17 2026
Claude 3.7 Sonnet
Anthropic · Feb 24 2025
GPT-5.4
OpenAI · Mar 5 2026
GPT-5.5
OpenAI · Apr 23 2026
Claude 3.5 Sonnet
Anthropic · Jun 20 2024
Claude Opus 4.1
Anthropic · Aug 5 2025
GPT-5.3-Instant
OpenAI · Mar 3 2026
GPT-5-Codex
OpenAI · Sep 15 2025
GPT-5.2
OpenAI · Dec 11 2025
Gemini 3.1 Pro
Google · Feb 19 2026
GPT-5.5-Pro
OpenAI · Apr 23 2026
Claude Opus 4
Anthropic · May 22 2025
Claude Sonnet 4
Anthropic · May 22 2025
LLaMA 4 Maverick
Meta · Apr 5 2025
o3
OpenAI · Apr 16 2025
GPT-5.1
OpenAI · Nov 12 2025
GPT-5.3-Codex
OpenAI · Feb 5 2026
GPT-5
OpenAI · Aug 7 2025
Gemini 2.5 Pro
Google · Mar 25 2025
Gemini 3.5 Flash
Google · May 19 2026
LLaMA 4 Scout
Meta · Apr 5 2025
Gemini 2.5 Flash
Google · Apr 17 2025
Grok 4.1 Fast
xAI · Nov 19 2025
DeepSeek-V4-Flash
DeepSeek · Apr 24 2026
Gemini 2.0 Flash
Google · Jan 30 2025
LLaMA 3.1
Meta · Jul 23 2024
GPT-4.1
OpenAI · May 14 2025
DeepSeek-V4-Pro
DeepSeek · Apr 24 2026
DeepSeek-V3.2
DeepSeek · Dec 1 2025
GPT-4o
OpenAI · May 13 2024
gpt-oss-120b
OpenAI · Aug 5 2025
Gemini 3.1 Flash-Lite
Google · Mar 3 2026
Claude 3 Haiku
Anthropic · Mar 4 2024
Kimi K2
Moonshot AI · Jul 11 2025
o4-mini
OpenAI · Apr 16 2025
DeepSeek R1-0528
DeepSeek · May 28 2025
GPT-4o mini
OpenAI · Jul 18 2024