open-rs  by knoveleng

Reinforcement learning for small LLM reasoning

Created 6 months ago
261 stars

Top 97.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code and datasets for enhancing reasoning in small LLMs (1.5B parameters) using reinforcement learning, targeting researchers and practitioners with resource constraints. It demonstrates significant improvements in mathematical reasoning benchmarks with a cost-effective fine-tuning approach.

How It Works

The project adapts the Group Relative Policy Optimization (GRPO) algorithm for fine-tuning small LLMs on a curated mathematical reasoning dataset. This approach aims to improve reasoning capabilities efficiently, achieving notable gains on benchmarks like AMC23 and AIME24 with a fraction of the data and cost of larger models.

Quick Start & Requirements

  • Install: Use uv for environment management. Install dependencies including vllm (v0.7.2) and flash-attn (requires PyTorch v2.5.1).
  • Prerequisites: Python 3.11, 4x NVIDIA A40 GPUs (48 GB VRAM each), Git LFS. Hugging Face and Weights & Biases authentication required.
  • Setup: Estimated setup time is minimal, but GPU resources are substantial.
  • Resources: Models, Datasets.

Highlighted Details

  • Achieves 80.0% accuracy on AMC23 and 46.7% on AIME24 with a 1.5B model.
  • Training cost estimated at $42 using 7,000 samples.
  • Outperforms larger models and previous preview versions on specific benchmarks.
  • Supports training and evaluation via accelerate and lighteval.

Maintenance & Community

  • Code and models are open-sourced.
  • Associated paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t".
  • Authors: Quy-Anh Dang and Chris Ngo.

Licensing & Compatibility

  • The repository does not explicitly state a license. Model weights are available on Hugging Face.

Limitations & Caveats

The project notes challenges with optimization instability and length constraints during extended training. Compatibility with PyTorch versions other than v2.5.1 may cause issues due to vLLM requirements.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
7 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.