InternLM-Math by InternLM

Math LLM for bilingual reasoning tasks

Created 2 years ago

532 stars

Top 59.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Junyang Lin

Core Maintainer at Alibaba Qwen

Project Summary

InternLM-Math provides state-of-the-art bilingual Large Language Models (LLMs) specifically designed for mathematical reasoning. It functions as a solver, prover, verifier, and augmentor, targeting researchers and developers working on advanced mathematical AI capabilities. The models offer significant improvements in both informal and formal mathematical reasoning tasks.

How It Works

InternLM-Math models are continued pre-trained on approximately 100 billion math-related tokens and then fine-tuned on a large bilingual math dataset. A key innovation is the integration of Lean 4, a theorem proving language, enabling the models to generate Lean code for reasoning tasks and act as a reward model for verifying reasoning steps. This approach allows for verifiable mathematical reasoning and auto-formalization.

Quick Start & Requirements

Inference: LMDeploy (>=0.2.1) is recommended. Hugging Face transformers library is also supported.
Dependencies: PyTorch, Transformers.
Resources: Models range from 1.8B to 8x22B parameters, requiring appropriate GPU memory.
Links: Paper, Hugging Face Checkpoints, Demo.

Highlighted Details

Achieves state-of-the-art performance on benchmarks like MiniF2F-test (43.4% for InternLM2-Math-Plus-7B), MATH (68.5 with Python for InternLM2-Math-Plus-Mixtral8x22B), and GSM8K (91.8 for InternLM2-Math-Plus-Mixtral8x22B).
Supports formal math reasoning via Lean 4 integration, including theorem proving and code generation.
Offers capabilities for math problem augmentation, code interpretation, and acting as a reward model for reasoning processes.
Models are available in various sizes (1.8B, 7B, 20B, 8x22B) and include base and chat-tuned versions.

Maintenance & Community

The project is actively developed by Shanghai AI Laboratory and associated researchers. Recent updates include new models like InternLM2.5-Step-Prover and datasets like Lean-Workbook.

Licensing & Compatibility

The project is released under the Apache 2.0 license, permitting commercial use and modification.

Limitations & Caveats

The models may exhibit issues such as jumping calculation steps, performing poorly on specific Chinese fill-in-the-blank and English multiple-choice problems, and code switching due to SFT data composition. Lean-related abilities might be limited to specific problem types.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days