Formal math benchmark for theorem proving systems
Top 75.9% on sourcepulse
MiniF2F is a benchmark dataset for evaluating automated theorem proving systems in formal mathematics. It comprises olympiad and high-school level math problems translated across multiple formal systems, primarily Lean and Metamath, with support for HOL Light and Isabelle. The goal is to provide a standardized evaluation framework for comparing the performance of different theorem provers.
How It Works
The benchmark organizes mathematical problems into distinct files for each targeted formal system. Each file contains the problem statement, and optionally, example proofs and supporting lemmas. This structure allows for direct comparison of prover capabilities across different formalisms, facilitating research and development in AI-driven mathematical reasoning.
Quick Start & Requirements
elan
(Lean's version manager).git clone https://github.com/openai/miniF2F
cd miniF2F
leanpkg configure
leanpkg build
valid.lean
and test.lean
for parsing efficiency.Highlighted Details
imo-1990-p3
) or dataset origin.Maintenance & Community
v1
(August 2021) being the current stable release.Licensing & Compatibility
Limitations & Caveats
The benchmark is a work in progress, with translations for HOL Light and Isabelle noted as still under development as of v1. Naming conventions are also subject to ongoing refinement.
1 year ago
1 week