miniF2F  by openai

Formal math benchmark for theorem proving systems

Created 4 years ago
392 stars

Top 73.4% on SourcePulse

GitHubView on GitHub
Project Summary

MiniF2F is a benchmark dataset for evaluating automated theorem proving systems in formal mathematics. It comprises olympiad and high-school level math problems translated across multiple formal systems, primarily Lean and Metamath, with support for HOL Light and Isabelle. The goal is to provide a standardized evaluation framework for comparing the performance of different theorem provers.

How It Works

The benchmark organizes mathematical problems into distinct files for each targeted formal system. Each file contains the problem statement, and optionally, example proofs and supporting lemmas. This structure allows for direct comparison of prover capabilities across different formalisms, facilitating research and development in AI-driven mathematical reasoning.

Quick Start & Requirements

  • Lean Installation: Requires elan (Lean's version manager).
  • Project Setup:
    git clone https://github.com/openai/miniF2F
    cd miniF2F
    leanpkg configure
    leanpkg build
    
  • Lean Files: Aggregated into valid.lean and test.lean for parsing efficiency.
  • Metamath Files: Each problem has a dedicated file.
  • Documentation: Pre-print

Highlighted Details

  • Contains 244 statements for Lean and Metamath, 165 for HOL Light, and 244 for Isabelle.
  • Problems are named according to competition year/number (e.g., imo-1990-p3) or dataset origin.
  • Supports validation and testing splits for system development and final evaluation.

Maintenance & Community

  • Contributions are welcomed via pull requests for new statements, proofs, or bug fixes.
  • The project is versioned, with v1 (August 2021) being the current stable release.
  • Active development occurs on the main branch; results should be reported against specific version branches.

Licensing & Compatibility

  • Lean folder: Apache License 2.0
  • Metamath folder: MIT License
  • HOL Light folder: FreeBSD License
  • Isabelle folder: Apache License 2.0
  • Permissive licenses facilitate commercial use and integration with closed-source systems.

Limitations & Caveats

The benchmark is a work in progress, with translations for HOL Light and Isabelle noted as still under development as of v1. Naming conventions are also subject to ongoing refinement.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and
7 more.

reasoning-gym by open-thought

1.2%
1k
Procedural dataset generator for reasoning models
Created 7 months ago
Updated 3 days ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App) and Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab).

LeanDojo by lean-dojo

0%
705
Machine learning for theorem proving in Lean
Created 2 years ago
Updated 5 days ago
Feedback? Help us improve.