ReProver  by lean-dojo

Theorem prover for Lean, augmented by retrieval

created 2 years ago
287 stars

Top 92.3% on sourcepulse

GitHubView on GitHub
Project Summary

ReProver provides retrieval-augmented language models for theorem proving in Lean 4. It addresses the challenge of generating correct and relevant tactics by leveraging a premise retrieval system to augment proof states, enabling more efficient and accurate automated theorem proving. The project is targeted at researchers and developers working with formal verification and automated reasoning systems.

How It Works

ReProver employs a two-stage approach. First, a premise retriever, based on a ByT5 encoder, embeds proof states and a corpus of relevant mathematical premises into vector representations. Cosine similarity is used to find the most relevant premises for a given proof state. Second, a tactic generator, a ByT5 encoder-decoder model, takes the proof state concatenated with the retrieved premises as input to generate a sequence of Lean tactics. This retrieval-augmented strategy aims to improve the quality and relevance of generated tactics by providing contextual information.

Quick Start & Requirements

  • Installation: Create a conda environment (conda create --yes --name ReProver python=3.11 ipython), activate it (conda activate ReProver), and install dependencies (pip install torch ...). Prepend the repo's root to PYTHONPATH.
  • Data: Download LeanDojo Benchmark 4 (python scripts/download_data.py), trace repos (python scripts/trace_repos.py), and log in to Weights & Biases (wandb login).
  • Prerequisites: Python 3.11, PyTorch (with CUDA support), transformers, deepspeed, pytorch-lightning, wandb, openai, rank_bm25, lean-dojo, vllm.
  • Resources: Training requires an NVIDIA A100 GPU with 80GB memory; smaller GPUs may necessitate adjustments to batch size and gradient accumulation.
  • Documentation: LeanDojo Website, Hugging Face Models

Highlighted Details

  • Offers pre-trained ByT5 models for tactic generation and premise retrieval on Hugging Face.
  • Supports training custom premise retrievers and tactic generators using Lightning CLI.
  • Includes scripts for evaluating premise retrieval (R@1, R@10, MRR) and theorem proving performance.
  • Provides utilities to convert PyTorch Lightning checkpoints to Hugging Face format for easier integration.

Maintenance & Community

  • The project is associated with the LeanDojo initiative and NeurIPS 2023.
  • Questions and discussions are handled via GitHub Discussions. Bug reports should be filed as GitHub Issues.

Licensing & Compatibility

  • The README does not explicitly state a license. The associated LeanDojo project uses an Apache 2.0 license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The main branch supports Lean 4; Lean 3 support is available on a legacy branch.
  • Training and evaluation procedures are complex, requiring significant setup and computational resources.
  • The README mentions that premise generation tactics may include markers (<a>, </a>) that need to be removed before use.
Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.