Research paper and code for LLM math reasoning scaling
Top 96.5% on sourcepulse
This repository provides code and data for scaling mathematical reasoning in large language models, focusing on techniques like Supervised Fine-Tuning (SFT) and Rejection-based Fine-Tuning (RFT). It targets researchers and practitioners working on improving LLM performance in mathematical problem-solving, offering reproducible results and pre-trained checkpoints.
How It Works
The project explores the impact of different fine-tuning strategies on mathematical reasoning capabilities, including In-Context Learning (ICL), SFT, and RFT. RFT, in particular, involves generating multiple responses and selecting the best ones for further training, aiming to improve generalization. The work also investigates data augmentation techniques, finding that query and response augmentation alone may not significantly help out-of-domain generalization.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is associated with research papers published in 2023, indicating recent activity. Key contributors are listed in the citations. No specific community links (Discord, Slack) are provided in the README.
Licensing & Compatibility
The repository itself appears to be code and data, with the underlying models likely subject to LLaMA's license. The specific license for the provided code and data is not explicitly stated in the README, which may require further investigation for commercial use.
Limitations & Caveats
Reproducing exact results may depend on specific library versions (Transformers <= 4.29). The README notes that query and response augmentation may not improve out-of-domain generalization. The project focuses on LLaMA and LLaMA-2 architectures.
10 months ago
1 day