DeepSeek-Math-V2  by deepseek-ai

LLM for self-verifiable mathematical reasoning

Created 4 days ago

New!

1,042 stars

Top 36.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

DeepSeek-Math-V2 addresses the critical limitations in current large language models' mathematical reasoning capabilities, specifically the gap between correct final answers and sound derivations, and the inapplicability of simple reward functions to complex tasks like theorem proving. It targets AI researchers and practitioners aiming to develop more rigorous and trustworthy AI systems for scientific discovery. The core benefit is the advancement towards self-verifiable mathematical reasoning, enhancing the reliability and depth of AI-driven mathematical problem-solving.

How It Works

The project introduces a novel self-verification framework. It begins by training an LLM-based verifier specifically for theorem proving tasks. Subsequently, a proof generator model is trained, utilizing the verifier's output as a reward signal. This incentivizes the generator to proactively identify and correct errors within its own generated proofs. To ensure continuous improvement and maintain a gap between generator and verifier capabilities, the system scales verification compute to automatically label complex, hard-to-verify proofs, thereby creating essential training data to further refine the verifier.

Quick Start & Requirements

DeepSeek-Math-V2 is an extension built upon the DeepSeek-V3.2-Exp-Base model, which is available for download via HuggingFace. For detailed inference instructions and support, users are directed to the DeepSeek-V3.2-Exp GitHub repository. The README does not specify non-default prerequisites such as particular hardware configurations (e.g., GPU, CUDA versions) or large datasets required for setup or operation.

Highlighted Details

  • Achieves gold-level performance on IMO 2025 and CMO 2024 benchmarks.
  • Secures a near-perfect score of 118 out of 120 on the Putnam 2024 competition, utilizing scaled test-time compute.
  • Demonstrates significant advancements in theorem-proving capabilities through its self-verification mechanism.

Maintenance & Community

For technical questions or support, users are encouraged to raise an issue on the relevant GitHub repository or contact the developers directly via email at service@deepseek.com. The provided README does not list specific community channels such as Discord or Slack.

Licensing & Compatibility

The usage of DeepSeek-Math-V2 models is governed by a specific "Model License." The precise terms, conditions, and potential restrictions for commercial use or integration into closed-source systems are not detailed within the README and must be reviewed from the full license document.

Limitations & Caveats

The project acknowledges that "much work remains" in the pursuit of fully self-verifiable mathematical reasoning. The current implementation primarily focuses on theorem proving, and the specific constraints and permissions outlined in the "Model License" require careful examination by potential users, particularly for commercial applications.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
1,098 stars in the last 4 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and
7 more.

reasoning-gym by open-thought

0.9%
1k
Procedural dataset generator for reasoning models
Created 10 months ago
Updated 2 weeks ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App) and Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab).

LeanDojo by lean-dojo

0.7%
731
Machine learning for theorem proving in Lean
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.