open-rs by knoveleng

Reinforcement learning for small LLM reasoning

Created 9 months ago

271 stars

Top 95.1% on SourcePulse

Project Summary

This repository provides code and datasets for enhancing reasoning in small LLMs (1.5B parameters) using reinforcement learning, targeting researchers and practitioners with resource constraints. It demonstrates significant improvements in mathematical reasoning benchmarks with a cost-effective fine-tuning approach.

How It Works

The project adapts the Group Relative Policy Optimization (GRPO) algorithm for fine-tuning small LLMs on a curated mathematical reasoning dataset. This approach aims to improve reasoning capabilities efficiently, achieving notable gains on benchmarks like AMC23 and AIME24 with a fraction of the data and cost of larger models.

Quick Start & Requirements

Install: Use uv for environment management. Install dependencies including vllm (v0.7.2) and flash-attn (requires PyTorch v2.5.1).
Prerequisites: Python 3.11, 4x NVIDIA A40 GPUs (48 GB VRAM each), Git LFS. Hugging Face and Weights & Biases authentication required.
Setup: Estimated setup time is minimal, but GPU resources are substantial.
Resources: Models, Datasets.

Highlighted Details

Achieves 80.0% accuracy on AMC23 and 46.7% on AIME24 with a 1.5B model.
Training cost estimated at $42 using 7,000 samples.
Outperforms larger models and previous preview versions on specific benchmarks.
Supports training and evaluation via accelerate and lighteval.

Maintenance & Community

Code and models are open-sourced.
Associated paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t".
Authors: Quy-Anh Dang and Chris Ngo.

Licensing & Compatibility

The repository does not explicitly state a license. Model weights are available on Hugging Face.

Limitations & Caveats

The project notes challenges with optimization instability and length constraints during extended training. Compatibility with PyTorch versions other than v2.5.1 may cause issues due to vLLM requirements.

open-rs by knoveleng

Explore Similar Projects

CoT-Collection by kaistAI

Seed-Thinking-v1.5 by ByteDance-Seed

gsm8k-ScRel by OFA-Sys

XBai-o4 by MetaStone-AI

l1 by cmu-l3

Tina by shangshang-wang

POLARIS by ChenxinAn-fdu

RLT by SakanaAI

awesome-llms-fine-tuning by Curated-Awesome-Lists

train-deepseek-r1 by FareedKhan-dev

rStar by microsoft

openr by openreasoner