JustRL  by thunlp

Scaling LLMs with a simple RL recipe

Created 4 months ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

JustRL presents a streamlined approach to scaling large language models (LLMs) using reinforcement learning (RL), specifically targeting 1.5B parameter models. It offers a simple, single-stage training recipe with fixed hyperparameters, achieving state-of-the-art performance on mathematical reasoning tasks. This method contrasts with complex, multi-stage pipelines, demonstrating competitive results with significantly reduced computational cost and enhanced training stability, making it valuable for researchers and practitioners seeking efficient LLM fine-tuning.

How It Works

JustRL's core innovation lies in its deliberate simplicity: a single-stage training process using standard GRPO with binary outcome rewards derived from a basic DAPO verifier (string-matching). It eschews multi-stage pipelines, dynamic schedules, and per-model hyperparameter tuning, instead relying on a fixed set of hyperparameters. This minimalist recipe ensures stable, monotonic performance improvements over extended training periods without oscillations or collapses, while achieving comparable or superior results to more complex methods with substantially less compute.

Quick Start & Requirements

  • Installation: Recommended via a conda environment: conda create -n justrl python=3.10 followed by conda activate justrl.
  • Key Dependencies: PyTorch (2.6.0), vLLM (0.8.4), transformers (4.51.3), sympy (1.13.1), pylatexenc (2.10).
  • Data: Requires downloading large evaluation output files from a provided Google Drive link and extracting them to the repository root.
  • Links: The project is associated with an ICLR 2026 Blogpost Track submission and provides a citation for the paper "JustRL: Scaling a 1.5 B LLM with a Simple RL Recipe".

Highlighted Details

  • Achieves state-of-the-art performance on mathematical reasoning benchmarks for 1.5B LLMs.
  • Delivers comparable or better results using 2x less compute than sophisticated, multi-stage RL approaches.
  • Demonstrates robustness and reproducibility by applying identical, fixed hyperparameters across different 1.5B base models (DeepSeek and Nemotron).
  • Provides complete evaluation scripts and released model weights for JustRL-DeepSeek-1.5B and JustRL-Nemotron-1.5B.

Maintenance & Community

Information regarding project maintainers, community channels (e.g., Discord, Slack), or specific development roadmaps is not detailed in the provided README excerpt.

Licensing & Compatibility

The README excerpt does not specify the software license. Consequently, compatibility for commercial use or linking with closed-source projects cannot be determined without further information.

Limitations & Caveats

The repository primarily focuses on evaluation scripts and released models, with limited explicit detail on the full training pipeline setup. The absence of a specified license presents a potential adoption blocker for commercial applications. Hardware requirements beyond core dependencies are not detailed.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
13 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

1.4%
3k
RL library for large language models
Created 9 months ago
Updated 1 day ago
Feedback? Help us improve.