I Tested the First Diffusion Reasoning LLM… It’s Insanely Fast

TLDR

Mercury 2 is the first diffusion-based reasoning large language model, demonstrating significantly faster response times compared to other speed-optimized LLMs while maintaining strong reasoning capabilities.

Takeways

• Mercury 2 is the first reasoning LLM to use a diffusion model, generating tokens in parallel for unparalleled speed.

• It is five times faster than other speed-optimized LLMs, producing 1,000 tokens per second while maintaining strong reasoning.

• Ideal for API-driven applications like customer service, voice apps, and complex coding where speed and reasoning are paramount.

Mercury 2 introduces a revolutionary diffusion model approach to large language models, enabling parallel token generation and refinement instead of sequential processing, which results in substantially faster output. This model has been benchmarked as five times faster than competitors like Claude Haiku and OpenAI's speed-optimized models, producing 1,000 tokens per second. Its unique combination of speed and reasoning makes it particularly suitable for applications requiring instant responses and complex problem-solving.

Mercury 2 Introduction

• 00:00:00 Mercury 2 is a new diffusion-based reasoning large language model that operates on a fundamentally different principle than traditional LLMs. Unlike sequential token generation, Mercury 2 generates tokens in parallel, similar to how diffusion models refine images from noise. This 'editor-like' approach allows it to produce complex outputs, such as a working game of checkers code, almost instantly.

Speed and Benchmarks

• 00:01:25 Mercury 2 has been benchmarked against other speed-optimized LLMs, proving to be five times faster than models like Claude Haiku and OpenAI's fast models, generating 1,000 tokens per second. Examples, such as generating complex chess game code, demonstrate its ability to produce hundreds of lines of functional code rapidly, even with 'high' reasoning effort settings, outperforming competitors in speed for comparable code output.

Key Features and Use Cases

• 00:04:22 The combination of speed and reasoning in Mercury 2 makes it exceptionally valuable for API integrations, particularly for AI-powered applications that demand instant responses. This includes customer service apps, voice assistants, and agent-based systems, where both rapid output and accurate, well-reasoned responses are critical. The pricing for Mercury 2 is also affordable, at 25 cents per million input tokens and 75 cents per million output tokens.

Accessing Mercury 2

• 00:05:41 Users can currently test Mercury 2 through a provided playground link, which allows for experimentation with different reasoning levels and web access capabilities. An API is also available for developers interested in integrating Mercury 2 into their applications, especially for tasks requiring fast and intelligent responses like coding, search, and agentic workflows.