DeepSeek-Prover-V1.5 by deepseek-ai

Theorem prover for formal mathematics using Lean 4

Created 1 year ago

552 stars

Top 57.9% on SourcePulse

Project Summary

DeepSeek-Prover-V1.5 is an open-source language model for theorem proving in Lean 4, designed to improve upon its predecessor by optimizing training and inference. It targets researchers and practitioners in formal mathematics and AI, offering state-of-the-art performance on benchmarks like miniF2F and ProofNet.

How It Works

The model is pre-trained on formal mathematical languages and fine-tuned with supervised learning and reinforcement learning from proof assistant feedback (RLPAF). A key innovation is RMaxTS, a Monte-Carlo tree search variant employing intrinsic-reward-driven exploration for diverse proof path generation, moving beyond single-pass proof generation.

Quick Start & Requirements

Install: Clone repo, install dependencies (pip install -r requirements.txt), set up Lean 4, build Mathlib4.
Prerequisites: Linux, Python 3.10, Lean 4.
Inference: Use Huggingface Transformers.
Experiments: python -m prover.launch --config=configs/RMaxTS.py --log_dir=logs/RMaxTS_results
Links: Model Downloads, quick_start.py

Highlighted Details

Achieves 63.5% on miniF2F-test and 25.3% on ProofNet benchmarks.
Offers 7B parameter models: Base, SFT, and RL versions.
Integrates RL from proof assistant feedback and RMaxTS for enhanced proof search.
Supports commercial use via its model license.

Maintenance & Community

Primarily maintained by DeepSeek AI.
For questions and discussions, use GitHub Discussions.
Report bugs via GitHub Issues.

Licensing & Compatibility

Code repository: MIT License.
Model use: Subject to DeepSeekMath Model License.
Supports commercial use.

Limitations & Caveats

The model is specifically designed for Lean 4 theorem proving and may not generalize to other formal languages or domains without further adaptation. Performance claims are based on specific benchmark evaluations.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days