DeepSeek-Prover-V2 by deepseek-ai

LLM for formal theorem proving in Lean 4, initialized with DeepSeek-V3 data

Created 8 months ago

1,225 stars

Top 32.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Project Summary

DeepSeek-Prover-V2 is an open-source large language model designed for formal theorem proving in Lean 4. It aims to advance formal mathematical reasoning by integrating informal reasoning with formal proof construction, targeting researchers and developers in automated theorem proving and formal verification.

How It Works

The model utilizes a recursive theorem proving pipeline powered by DeepSeek-V3. It first prompts DeepSeek-V3 to decompose complex theorems into subgoals and formalize these steps in Lean 4. A smaller 7B model handles proof search for individual subgoals. Successful proofs are then combined with DeepSeek-V3's chain-of-thought reasoning to create a cold-start dataset. This dataset is used for fine-tuning, followed by a reinforcement learning stage that uses binary correct-or-incorrect feedback to enhance the model's ability to bridge informal reasoning with formal proof generation.

Quick Start & Requirements

Install/Run: Use Huggingface's Transformers library.
Prerequisites: Python, PyTorch. The 671B model requires significant computational resources.
Resources: The 671B model is based on DeepSeek-V3 architecture. The 7B model has an extended context length of 32K tokens.
Links: Model Download, Dataset Download, DeepSeek-V3 Docs

Highlighted Details

Achieves 88.9% pass ratio on MiniF2F-test and solves 49/658 problems on PutnamBench.
Introduces ProverBench, a benchmark with 325 formalized problems including AIME competition math.
Available in 7B and 671B parameter sizes.
The 7B model is built on DeepSeek-Prover-V1.5-Base with a 32K token context length.

Maintenance & Community

Developed by DeepSeek AI.
Contact: service@deepseek.com or raise an issue on GitHub.

Licensing & Compatibility

Subject to the Model License. Specific terms are not detailed in the README.

Limitations & Caveats

The README does not specify any limitations or caveats regarding the model's performance, stability, or potential biases. The 671B model's substantial size implies significant hardware requirements for inference.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days