miniF2F  by openai

Formal math benchmark for theorem proving systems

created 4 years ago
382 stars

Top 75.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MiniF2F is a benchmark dataset for evaluating automated theorem proving systems in formal mathematics. It comprises olympiad and high-school level math problems translated across multiple formal systems, primarily Lean and Metamath, with support for HOL Light and Isabelle. The goal is to provide a standardized evaluation framework for comparing the performance of different theorem provers.

How It Works

The benchmark organizes mathematical problems into distinct files for each targeted formal system. Each file contains the problem statement, and optionally, example proofs and supporting lemmas. This structure allows for direct comparison of prover capabilities across different formalisms, facilitating research and development in AI-driven mathematical reasoning.

Quick Start & Requirements

  • Lean Installation: Requires elan (Lean's version manager).
  • Project Setup:
    git clone https://github.com/openai/miniF2F
    cd miniF2F
    leanpkg configure
    leanpkg build
    
  • Lean Files: Aggregated into valid.lean and test.lean for parsing efficiency.
  • Metamath Files: Each problem has a dedicated file.
  • Documentation: Pre-print

Highlighted Details

  • Contains 244 statements for Lean and Metamath, 165 for HOL Light, and 244 for Isabelle.
  • Problems are named according to competition year/number (e.g., imo-1990-p3) or dataset origin.
  • Supports validation and testing splits for system development and final evaluation.

Maintenance & Community

  • Contributions are welcomed via pull requests for new statements, proofs, or bug fixes.
  • The project is versioned, with v1 (August 2021) being the current stable release.
  • Active development occurs on the main branch; results should be reported against specific version branches.

Licensing & Compatibility

  • Lean folder: Apache License 2.0
  • Metamath folder: MIT License
  • HOL Light folder: FreeBSD License
  • Isabelle folder: Apache License 2.0
  • Permissive licenses facilitate commercial use and integration with closed-source systems.

Limitations & Caveats

The benchmark is a work in progress, with translations for HOL Light and Isabelle noted as still under development as of v1. Naming conventions are also subject to ongoing refinement.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.