gsm8k-ScRel  by OFA-Sys

Research paper and code for LLM math reasoning scaling

created 2 years ago
268 stars

Top 96.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code and data for scaling mathematical reasoning in large language models, focusing on techniques like Supervised Fine-Tuning (SFT) and Rejection-based Fine-Tuning (RFT). It targets researchers and practitioners working on improving LLM performance in mathematical problem-solving, offering reproducible results and pre-trained checkpoints.

How It Works

The project explores the impact of different fine-tuning strategies on mathematical reasoning capabilities, including In-Context Learning (ICL), SFT, and RFT. RFT, in particular, involves generating multiple responses and selecting the best ones for further training, aiming to improve generalization. The work also investigates data augmentation techniques, finding that query and response augmentation alone may not significantly help out-of-domain generalization.

Quick Start & Requirements

  • Installation: Primarily uses shell scripts for training and inference.
  • Prerequisites: Requires specific versions of Transformers (<= 4.29 recommended for reproducibility), Python, and potentially CUDA-enabled GPUs for training and inference (e.g., 8GB for 7B/13B models, 16GB for 33B, 32GB for 65B/70B models).
  • Resources: Training LLaMA models requires substantial GPU resources. Inference can be accelerated with vLLM.
  • Links:

Highlighted Details

  • Achieves significant accuracy improvements on GSM8K and MATH datasets with various LLaMA model sizes (e.g., 82.5% GSM8K with MuggleMATH-70B).
  • Provides detailed benchmarks for different fine-tuning methods (ICL, SFT, RFT) across model sizes.
  • Includes specific scripts for training and evaluation, tailored for LLaMA and LLaMA-2 models.
  • Offers pre-trained checkpoints for RFT-tuned models and MuggleMATH models.

Maintenance & Community

The project is associated with research papers published in 2023, indicating recent activity. Key contributors are listed in the citations. No specific community links (Discord, Slack) are provided in the README.

Licensing & Compatibility

The repository itself appears to be code and data, with the underlying models likely subject to LLaMA's license. The specific license for the provided code and data is not explicitly stated in the README, which may require further investigation for commercial use.

Limitations & Caveats

Reproducing exact results may depend on specific library versions (Transformers <= 4.29). The README notes that query and response augmentation may not improve out-of-domain generalization. The project focuses on LLaMA and LLaMA-2 architectures.

Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.