DeepSeek V3.1 Terminus vs Gemini 3.5 Flash
DeepSeek DeepSeek V3.1 Terminus | Google Gemini 3.5 Flash | |
|---|---|---|
| Overview | ||
| Company | DeepSeek | |
| Release date | Sep 22 2025 | May 19 2026 |
| Access | Open Weight | Proprietary |
| Benchmarks | ||
Nonsense detection BullshitBench v2Given a confidently-worded but nonsensical prompt, does the AI spot that it makes no sense and push back — instead of playing along and inventing an answer? The score is how often it clearly called out the nonsense. Higher is better. | — | 20% |
Agentic coding SWE-Bench ProCan the AI fix real bugs in real software? It's handed actual problems from open-source projects and has to write code that genuinely solves them. Higher is better. | — | 55.1% |
Agentic coding CursorBench v3.1Cursor's own test of harder, real-world coding tasks inside a code editor. Higher is better. | — | 49.8% |
Agentic terminal coding Terminal-Bench 2.1Can the AI work in a command-line terminal — running commands and finishing technical setup tasks the way a developer would? Higher is better. | — | 76.2% |
Multi-step tool use MCP AtlasCan the AI chain together many tools and steps to complete one bigger task, rather than doing just a single thing? Higher is better. | — | 83.6% |
General tool use ToolathlonTests how well the AI uses everyday real-world tools and apps to get things done. Higher is better. | — | 56.5% |
Multidisciplinary reasoning Humanity's Last Exam · no toolsHumanity's Last Exam — extremely hard expert questions across many subjects, written so you can't just look up the answer. “No tools” means the AI answers on its own. Higher is better. | — | 40.2% |
Abstract reasoning ARC-AGI-2Puzzle-style tests of abstract reasoning and pattern-finding — the kind of thing people find easy but AIs often struggle with. Higher is better. | — | 72.1% |
Agentic computer use OSWorld-VerifiedCan the AI actually operate a computer — clicking, typing, and using real apps — to finish tasks on its own? Higher is better. | — | 78.4% |
Agentic financial analysis Finance Agent v2Tests the AI on real financial-analysis work, like digging through reports and making sound decisions. Higher is better. | — | 57.9% |
Knowledge work GDPval-AAMeasures how well the AI does economically valuable knowledge work, judged against human experts. Shown as a rating (like a chess Elo) — higher is better. | — | 1656 |
Chart reasoning CharXiv ReasoningCan the AI read and reason about complex charts and figures, not just text? Higher is better. | — | 84.2% |
Multimodal reasoning MMMU-ProA tougher version of MMMU — college-level questions that mix images, diagrams, and text together. Higher is better. | — | 83.6% |
Spatial reasoning Blueprint-Bench 2Can the AI reason about space and layout — for example, understanding a floor plan or blueprint? Higher is better. | — | 33.6% |
Long context MRCR v2 (8-needle) · 128k averageTests whether the AI can find specific details buried inside a very long document (around 128k tokens — roughly a long book). Higher is better. | — | 77.3% |
Long context MRCR v2 (8-needle) · 1M pointwiseTests whether the AI can find specific details buried inside an enormous document (around 1 million tokens — many books). Higher is better. | — | 26.6% |
| Timeline | ||
| Release gap | DeepSeek V3.1 Terminus shipped 239 days before Gemini 3.5 Flash | |
Which is better: DeepSeek V3.1 Terminus or Gemini 3.5 Flash?
DeepSeek V3.1 Terminus and Gemini 3.5 Flash don't publish scores on any of the same benchmarks, so there's no direct head-to-head comparison. DeepSeek V3.1 Terminus shipped 239 days before Gemini 3.5 Flash, so benchmark comparisons should account for the intervening progress.
DeepSeek V3.1 Terminus is open weight, while Gemini 3.5 Flash is proprietary.
Direct benchmark comparisons are unavailable — DeepSeek V3.1 Terminus and Gemini 3.5 Flash don't publish scores on any of the same benchmarks.
Frequently asked questions
DeepSeek V3.1 Terminus was released by DeepSeek on Sep 22 2025.
Gemini 3.5 Flash was released by Google on May 19 2026.
DeepSeek V3.1 Terminus is an open weight model released by DeepSeek. Gemini 3.5 Flash is a proprietary model released by Google.
Other comparisons
DeepSeek V3.1 Terminus vs Claude Sonnet 5Gemini 3.5 Flash vs Claude Sonnet 5DeepSeek V3.1 Terminus vs GPT-5.6 SolGemini 3.5 Flash vs GPT-5.6 SolDeepSeek V3.1 Terminus vs Muse SparkGemini 3.5 Flash vs Muse SparkDeepSeek V3.1 Terminus vs Grok 4.3 BetaGemini 3.5 Flash vs Grok 4.3 BetaDeepSeek V3.1 Terminus vs Mistral Medium 3.5Gemini 3.5 Flash vs Mistral Medium 3.5DeepSeek V3.1 Terminus vs Kimi K2.7 CodeGemini 3.5 Flash vs Kimi K2.7 Code