AI Model Release Tracker - Timeline of Major AI Models from 2022-2026

Nonsense detection

BullshitBench v2

Given a confidently-worded but nonsensical prompt, does the AI spot that it makes no sense and push back — instead of playing along and inventing an answer? The score is how often it clearly called out the nonsense. Higher is better.

Rankings

Higher is better
1
Claude Opus 4.8
Anthropic · May 28 2026
95%
2
Claude Sonnet 4.6
Anthropic · Feb 17 2026
91%
3
Claude Opus 4.5
Anthropic · Nov 24 2025
90%
4
Claude Opus 4.6
Anthropic · Feb 5 2026
87%
5
Claude Opus 4.7
Anthropic · Apr 16 2026
83%
6
Claude Sonnet 4.5
Anthropic · Sep 29 2025
79%
7
Claude Haiku 4.5
Anthropic · Oct 15 2025
77%
8
Kimi K2.6
Moonshot AI · Apr 21 2026
65%
9
Grok 4.20 Beta
xAI · Feb 17 2026
56%
9
Claude Fable 5
Anthropic · Jun 9 2026
56%
11
Kimi K2.5
Moonshot AI · Jan 27 2026
52%
12
Claude 3.5 Haiku
Anthropic · Oct 22 2024
50%
12
Grok 4.3 Beta
xAI · Apr 17 2026
50%
14
Claude 3.7 Sonnet
Anthropic · Feb 24 2025
49%
15
GPT-5.4
OpenAI · Mar 5 2026
48%
16
GPT-5.5
OpenAI · Apr 23 2026
47%
17
Claude 3.5 Sonnet
Anthropic · Jun 20 2024
45%
18
Claude Opus 4.1
Anthropic · Aug 5 2025
43%
19
GPT-5.3-Instant
OpenAI · Mar 3 2026
40%
20
GPT-5-Codex
OpenAI · Sep 15 2025
39%
21
GPT-5.2
OpenAI · Dec 11 2025
38%
22
Gemini 3.1 Pro
Google · Feb 19 2026
37%
23
GPT-5.5-Pro
OpenAI · Apr 23 2026
36%
24
Claude Opus 4
Anthropic · May 22 2025
34%
25
Claude Sonnet 4
Anthropic · May 22 2025
30%
26
LLaMA 4 Maverick
Meta · Apr 5 2025
28%
27
o3
OpenAI · Apr 16 2025
26%
28
GPT-5.1
OpenAI · Nov 12 2025
25%
29
GPT-5.3-Codex
OpenAI · Feb 5 2026
24%
30
GPT-5
OpenAI · Aug 7 2025
21%
31
Gemini 2.5 Pro
Google · Mar 25 2025
20%
31
Gemini 3.5 Flash
Google · May 19 2026
20%
33
LLaMA 4 Scout
Meta · Apr 5 2025
19%
33
Gemini 2.5 Flash
Google · Apr 17 2025
19%
33
Grok 4.1 Fast
xAI · Nov 19 2025
19%
36
DeepSeek-V4-Flash
DeepSeek · Apr 24 2026
18%
37
Gemini 2.0 Flash
Google · Jan 30 2025
15%
38
LLaMA 3.1
Meta · Jul 23 2024
14%
38
GPT-4.1
OpenAI · May 14 2025
14%
38
DeepSeek-V4-Pro
DeepSeek · Apr 24 2026
14%
41
DeepSeek-V3.2
DeepSeek · Dec 1 2025
13%
42
GPT-4o
OpenAI · May 13 2024
12%
43
gpt-oss-120b
OpenAI · Aug 5 2025
11%
43
Gemini 3.1 Flash-Lite
Google · Mar 3 2026
11%
45
Claude 3 Haiku
Anthropic · Mar 4 2024
10%
45
Kimi K2
Moonshot AI · Jul 11 2025
10%
47
o4-mini
OpenAI · Apr 16 2025
8%
47
DeepSeek R1-0528
DeepSeek · May 28 2025
8%
49
GPT-4o mini
OpenAI · Jul 18 2024
2%
← All benchmarks