miniF2F by openai

Formal math benchmark for theorem proving systems

Created 4 years ago

415 stars

Top 70.8% on SourcePulse

View on GitHub

2 Experts Love This Project

Victor Taelin

Author of Bend, Kind, HVM

Jinze Bai

Research Scientist at Alibaba Qwen

Project Summary

MiniF2F is a benchmark dataset for evaluating automated theorem proving systems in formal mathematics. It comprises olympiad and high-school level math problems translated across multiple formal systems, primarily Lean and Metamath, with support for HOL Light and Isabelle. The goal is to provide a standardized evaluation framework for comparing the performance of different theorem provers.

How It Works

The benchmark organizes mathematical problems into distinct files for each targeted formal system. Each file contains the problem statement, and optionally, example proofs and supporting lemmas. This structure allows for direct comparison of prover capabilities across different formalisms, facilitating research and development in AI-driven mathematical reasoning.

Quick Start & Requirements

Lean Installation: Requires elan (Lean's version manager).

Project Setup:

git clone https://github.com/openai/miniF2F
cd miniF2F
leanpkg configure
leanpkg build

Lean Files: Aggregated into valid.lean and test.lean for parsing efficiency.
Metamath Files: Each problem has a dedicated file.
Documentation: Pre-print

Highlighted Details

Contains 244 statements for Lean and Metamath, 165 for HOL Light, and 244 for Isabelle.
Problems are named according to competition year/number (e.g., imo-1990-p3) or dataset origin.
Supports validation and testing splits for system development and final evaluation.

Maintenance & Community

Contributions are welcomed via pull requests for new statements, proofs, or bug fixes.
The project is versioned, with v1 (August 2021) being the current stable release.
Active development occurs on the main branch; results should be reported against specific version branches.

Licensing & Compatibility

Lean folder: Apache License 2.0
Metamath folder: MIT License
HOL Light folder: FreeBSD License
Isabelle folder: Apache License 2.0
Permissive licenses facilitate commercial use and integration with closed-source systems.

Limitations & Caveats

The benchmark is a work in progress, with translations for HOL Light and Isabelle noted as still under development as of v1. Naming conventions are also subject to ongoing refinement.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days