DeepSeek Just Dropped TERMINUS: The Next Level Hybrid Model

TLDR

DeepSeek's new Terminus model significantly improves English language consistency and tool-use agent performance, maintaining aggressive pricing and an open-source license despite some trade-offs in raw coding and a potential political censorship angle.

Takeways

• Terminus significantly improves English language consistency and multi-step agent performance.

• It maintains highly competitive pricing and is open-source under the MIT license for commercial use.

• The model shows strong performance in tool-use and reasoning tasks, despite minor trade-offs in specific coding benchmarks.

DeepSeek has released Terminus, an upgrade to its v3.1 model, emphasizing a hybrid reasoning approach with enhanced agent capabilities. This model delivers notable improvements in English language consistency and external tool use, achieving higher scores on multi-step web search benchmarks like Browse Comp. While it remains highly competitive in pricing and offers an open-source license, some areas like competitive coding skills and Chinese language web performance show slight declines.

Terminus Model Overview

• 00:00:02 DeepSeek has released Terminus, an upgraded version of its v3.1 model, which further develops its hybrid reasoning approach by integrating agent-like functions that use external tools. The model demonstrates significantly improved consistency in handling English and Chinese languages, resolving issues with mixed language output and random characters found in previous versions. This release focuses on practical applications and reliable outputs through its enhanced agent capabilities.

Performance and Benchmarks

• 00:00:55 Terminus shows substantial improvements in agent reliability and tool-use, as evidenced by its performance on benchmarks. On Browse Comp, a multi-step live web search benchmark, Terminus scored 38.5, up from v3.1's 30, and on Terminal Bench, it climbed from 31.3 to 36.7. While English web performance was optimized, leading to a slight dip in Chinese language Browse Comp scores, Terminus still demonstrates meaningful gains in pure reasoning tasks, such as Simple QA and Sui Verified, indicating its enhanced capabilities in complex problem-solving.

Dual Mode Architecture

• 00:01:46 Terminus retains DeepSeek's dual-mode setup, featuring 'DeepSeek Chat' for straightforward tasks like conversations and JSON outputs with a max output of 8,000 tokens, and 'DeepSeek Reasoner' for more complex, multi-step problems with a max output of 64,000 tokens. Both modes can process up to 128,000 tokens of context. The system intelligently routes requests, sending tool-use tasks initiated in Reasoner mode back through the Chat model to optimize workflow, and supports developer-centric features like function calling and fill-in-the-middle completions.

Cost and Open-Source Model

• 00:05:57 DeepSeek continues its aggressive pricing strategy with Terminus, offering API pricing significantly lower than competitors like GPT-5 and Claude Opus 4.1. The model is also fully open source under the MIT license, a notable distinction from many proprietary models, allowing commercial use without additional costs. This approach lowers the barrier to entry for developers and provides flexibility for deployment, even though Terminus with 685 billion parameters matches or exceeds the performance of closed systems.

Real-World Application & Future

• 00:08:11 In real-world tests, Terminus excels at generating structured code for tasks like SAS landing pages and handles financial planning prompts effectively, though some found third-party routing through providers like OpenRouter to yield stronger answers. While it struggled with some creative coding tasks, like generating SVG for a butterfly, it impressively built a functional 3D Minecraft clone. Despite minor technical issues and a political angle due to potential censorship, Terminus is a solid upgrade, with DeepSeek v4 and a successor to R1 (R2) already anticipated.