Code for replicating a math problem-solving solution
Top 66.8% on sourcepulse
This repository provides the code and datasets for a winning solution to the AI Mathematical Olympiad - Progress Prize 1. It targets researchers and engineers in AI and mathematics, offering a robust framework for fine-tuning large language models on complex math problems using tool-integrated reasoning. The solution demonstrates a novel approach to generating high-quality training data and employs a self-consistency decoding algorithm for improved accuracy.
How It Works
The solution fine-tunes the DeepSeekMath-Base 7B model in two stages: first on Chain of Thought (CoT) data, then on Tool-Integrated Reasoning (TIR) data. TIR data is generated using GPT-4 to create code execution feedback loops, enhancing the model's ability to solve problems requiring symbolic manipulation and computation. A self-consistency decoding algorithm (SC-TIR) is used during inference to generate multiple solution candidates, improving robustness.
Quick Start & Requirements
pip install -r requirements.txt
, and Flash Attention 2. Log in to Hugging Face CLI.Highlighted Details
Maintenance & Community
The project is associated with Numina and its contributors include prominent figures in LLM research. Further details on community or roadmap are not explicitly provided in the README.
Licensing & Compatibility
The repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The training process is resource-intensive, requiring a cluster of 8x H100 GPUs. While inference can be done on less powerful hardware with quantization, the full performance benefits are realized with high-end GPUs. The dataset generation pipeline relies on GPT-4, which may introduce biases or limitations inherent to the model.
1 year ago
Inactive