The first open-source mathematical reasoning model to reach the IMO Gold level

On November 27, 2025, DeepSeek quietly open-sourced a new model , DeepSeek-Math-V2, on Hugging Face without notice. This is a model focusing on mathematical reasoning, theorem proofs, and long-chain logic deduction, and is also the first AI system in the industry to reach the IMO (International Mathematical Olympiad) gold medal level and is completely open source.

This release not only excited the open source community but also directly ignited the enthusiasm of discussions in the AI academic and engineering circles. Many overseas developers describe DeepSeek’s move as:

“The whales are back again.”

In the context of GPT-5.1, Grok 4.1, and Gemini 3 just been updated for several weeks, this mathematical model has made the competition fierce again.

01 Core highlights: not “calculate the answer correctly”, but “reason like a mathematician”

The key breakthrough point for DeepSeek-Math-V2 is its shift from “result-oriented” to “proof-oriented.”

Traditional math LLMs often rely on massive “answer labeling” training methods, but this has inherent flaws:
The final answer is correct ≠ the reasoning step is correct.
In real mathematical tasks, especially theorem proofs, the reasoning process is much more important than the answer.

Math-V2’s training system is completely reverse-engineered – teaches the model to review its own proofs.

Core technology: Generator–Verifier dual-model architecture

Generator: Produce a draft of proof of output and construct lemmas
Verifier: Step-by-step review of logical consistency and structural integrity
Provides a “Mislocation + Remediation Recommendations” cycle
It is similar to the model of “mathematicians write proofs→ reviewers pick mistakes → revise the manuscript”

This is the key mechanism of Math-V2 to reason for a long time and solve difficult theorem proof problems.

Source: Hugging Face model documentation

Source: Sohu Technology Article

02 Competition results: This is the first time that an open source model has reached the IMO gold medal level

DeepSeek-Math-V2’s performance has directly hit the top of the industry:

IMO-ProofBench Benchmark

Basic subset: ≈ 99% (highest in the industry)
Advanced subset: 61.9% (close to Google’s Gemini DeepThink’s 65.7%)

Source: Tencent Technology

InfoQ Analytics

Practical results of international mathematics competitions

In the review published with the paper:

IMO 2025 → gold group level
CMO (Chinese Mathematics Olympiad) 2024 → gold medal level
Putnam 2024 → 118/120 (close to full marks)

This is equivalent to:

The model already has the mathematical reasoning capabilities of the world’s top competitors.

Source: Xinhua News Agency English Channel

MarkTechPost reports

03 Why is it a “real breakthrough”?

Mathematical reasoning is one of the hardest areas of AI because it:

Long-chain logic needs to be consistent
Every step must be rigorous, with no room for error
Statistical answers that are “intuitive” are not allowed
Verifiability (which is hard for LLMs)

The significance of Math-V2 is that it provides a “self-verifiable framework for mathematical reasoning.”
This means that AI can try:

Deal with open math problems
Perform a true theorem proof
Construct a structured proof tree
Automatically generate lemma + check consistency

In a sense, this is more like a “mathematical inference operating system” than a regular LLM.

04 Technical architecture: more like a “team of mathematicians”, not a single large model

The inference flow (simplified) for Math-V2 is as follows:

Formal Task Parsing
Generate a Draft of Preliminary Proof (Generator)
Verifier step by step
Fault localization
Regeneration
Loop until the logic is solid

Very similar:

“Give you 1 hour to write a certificate → give you 1 hour to review → give you time to revise”

This allows Math-V2 to perform “Extended Test-Time Compute”, which is the core reason why it is close to the top of humanity on Putnam and IMO.

05 Comparison with Google / OpenAI

Here’s a visual summary that you can put directly into your blog:

model	Open source	Mathematical reasoning	IMO performance	Theorem proof ability
DeepSeek-Math-V2	✔️ Completely open source	Strong (Generate + Validate)	Gold medal level	strong
Gemini DeepThink (IMO Gold)	❌ Closed source	strong	Gold medal	strong
GPT-5.1 series	❌ Closed source	Medium-strong	Not announced	Medium-strong
Grok 4.1	❌ Closed source	Medium	Not announced	Medium

The biggest difference is that Math-V2 is the only model that is “fully publicly weighted, locally deployable, and at the level of a gold medal in the Mathematical Olympiad”.

Source: SCMP, InfoQ Synthesis

This makes it a major milestone in the field of academia, mathematical automation research, and symbolic reasoning research.

06 Reaction from the Open Source Community: Why Do You Say “The Whale Is Back”?

The overseas ML community generally considers DeepSeek-Math-V2 to be one of the most shocking events in open source AI this year because:

Surpasses Gemini DeepThink (Google)
Completely open source
No hefty API costs
Have “mathematical research-level” reasoning ability

Some comments even speculate: “DeepSeek may hit the Code LLM space with its next launch of programming models.” ”

Source: Analytics India Magazine

Reference source

Hugging Face：DeepSeek-Math-V2 ｜
SCMP：DeepSeek releases first open AI model with IMO-gold performance ｜
Xinhua：DeepSeek AI releases math model scoring IMO-level gold ｜
Gigazine：DeepSeek Math-V2 open weight release ｜
InfoQ: DeepSeekMath-V2 Self-Validating Mathematical Inference Analysis |
Tencent News: DeepSeekMath-V2 Mathematical Olympiad Gold Medal Report |
China.com: DeepSeek launches gold-level mathematical model |
Sohu Technology: DeepSeekMath-V2 Technical Paper Abstract |
MarkTechPost：Math-V2 scores 118/120 on Putnam ｜
Analytics India：DeepSeek joins OpenAI & Google at IMO level ｜
OSChina: DeepSeekMath-V2 Open Source Released |
Blog Garden: DeepSeekMath-V2 Technology Analysis

Tubing: