gsm8k-ScRel by OFA-Sys

Research paper and code for LLM math reasoning scaling

Created 2 years ago

269 stars

Top 95.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Junyang Lin

Core Maintainer at Alibaba Qwen

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

This repository provides code and data for scaling mathematical reasoning in large language models, focusing on techniques like Supervised Fine-Tuning (SFT) and Rejection-based Fine-Tuning (RFT). It targets researchers and practitioners working on improving LLM performance in mathematical problem-solving, offering reproducible results and pre-trained checkpoints.

How It Works

The project explores the impact of different fine-tuning strategies on mathematical reasoning capabilities, including In-Context Learning (ICL), SFT, and RFT. RFT, in particular, involves generating multiple responses and selecting the best ones for further training, aiming to improve generalization. The work also investigates data augmentation techniques, finding that query and response augmentation alone may not significantly help out-of-domain generalization.

Quick Start & Requirements

Installation: Primarily uses shell scripts for training and inference.
Prerequisites: Requires specific versions of Transformers (<= 4.29 recommended for reproducibility), Python, and potentially CUDA-enabled GPUs for training and inference (e.g., 8GB for 7B/13B models, 16GB for 33B, 32GB for 65B/70B models).
Resources: Training LLaMA models requires substantial GPU resources. Inference can be accelerated with vLLM.
Links:
- MuggleMATH checkpoints: https://huggingface.co/OFA-Sys/MuggleMath_7B
- RFT checkpoints: https://huggingface.co/OFA-Sys/gsm8k-rft-llama7b-sample100

Highlighted Details

Achieves significant accuracy improvements on GSM8K and MATH datasets with various LLaMA model sizes (e.g., 82.5% GSM8K with MuggleMATH-70B).
Provides detailed benchmarks for different fine-tuning methods (ICL, SFT, RFT) across model sizes.
Includes specific scripts for training and evaluation, tailored for LLaMA and LLaMA-2 models.
Offers pre-trained checkpoints for RFT-tuned models and MuggleMATH models.

Maintenance & Community

The project is associated with research papers published in 2023, indicating recent activity. Key contributors are listed in the citations. No specific community links (Discord, Slack) are provided in the README.

Licensing & Compatibility

The repository itself appears to be code and data, with the underlying models likely subject to LLaMA's license. The specific license for the provided code and data is not explicitly stated in the README, which may require further investigation for commercial use.

Limitations & Caveats

Reproducing exact results may depend on specific library versions (Transformers <= 4.29). The README notes that query and response augmentation may not improve out-of-domain generalization. The project focuses on LLaMA and LLaMA-2 architectures.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days