Kimina-Prover-Preview by MoonshotAI

Research paper for formal reasoning model in Lean 4

Created 10 months ago

361 stars

Top 78.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Victor Taelin

Author of Bend, Kind, HVM

Magnus Müller

Cofounder of Browser Use

Project Summary

Kimina-Prover Preview introduces a novel approach to formal mathematical reasoning, targeting researchers and developers in automated theorem proving and formal verification. It achieves state-of-the-art performance on benchmarks like miniF2F by leveraging large language models trained with reinforcement learning, enabling human-like reasoning and rigorous theorem proving in Lean 4.

How It Works

The model employs a whole-proof generation strategy enhanced by reinforcement learning, eschewing complex techniques like Monte Carlo tree search. This approach, combined with significant model scaling (72B parameters) and an extended context window (32K tokens), allows for efficient learning and robust performance. A key innovation is the "Formal Reasoning Pattern," a designed reasoning style that bridges informal mathematical intuition with formal verification.

Quick Start & Requirements

Install: Distilled models and autoformalization model are available on Hugging Face. Kimina Lean Server is also released.
Prerequisites: Lean 4, specific Python versions, and potentially GPU resources for running the models.
Resources: Details on specific hardware requirements for running the 72B model are not explicitly stated but are implied to be substantial.
Links: HuggingFace, Demo, Kimina Lean Server, CombiBench

Highlighted Details

Achieves 80%+ pass rate on miniF2F, outperforming prior works by a significant margin.
Demonstrates strong performance with high sample efficiency (e.g., 68.85% pass@32).
Identifies and releases corrections for at least 5 problems in the miniF2F-test dataset.
Utilizes a 32K token context window, the longest in neural theorem proving.

Maintenance & Community

The project is associated with the Numina & Kimi Team. Further community engagement channels are not explicitly listed in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The project is released as a "Preview," suggesting potential for ongoing development and changes. Specific hardware requirements for optimal performance, particularly for the larger models, are not fully detailed.

Health Check

Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days