Discover and explore top open-source AI tools and projects—updated daily.
thunlpScaling LLMs with a simple RL recipe
Top 100.0% on SourcePulse
JustRL presents a streamlined approach to scaling large language models (LLMs) using reinforcement learning (RL), specifically targeting 1.5B parameter models. It offers a simple, single-stage training recipe with fixed hyperparameters, achieving state-of-the-art performance on mathematical reasoning tasks. This method contrasts with complex, multi-stage pipelines, demonstrating competitive results with significantly reduced computational cost and enhanced training stability, making it valuable for researchers and practitioners seeking efficient LLM fine-tuning.
How It Works
JustRL's core innovation lies in its deliberate simplicity: a single-stage training process using standard GRPO with binary outcome rewards derived from a basic DAPO verifier (string-matching). It eschews multi-stage pipelines, dynamic schedules, and per-model hyperparameter tuning, instead relying on a fixed set of hyperparameters. This minimalist recipe ensures stable, monotonic performance improvements over extended training periods without oscillations or collapses, while achieving comparable or superior results to more complex methods with substantially less compute.
Quick Start & Requirements
conda create -n justrl python=3.10 followed by conda activate justrl.Highlighted Details
Maintenance & Community
Information regarding project maintainers, community channels (e.g., Discord, Slack), or specific development roadmaps is not detailed in the provided README excerpt.
Licensing & Compatibility
The README excerpt does not specify the software license. Consequently, compatibility for commercial use or linking with closed-source projects cannot be determined without further information.
Limitations & Caveats
The repository primarily focuses on evaluation scripts and released models, with limited explicit detail on the full training pipeline setup. The absence of a specified license presents a potential adoption blocker for commercial applications. Hardware requirements beyond core dependencies are not detailed.
2 days ago
Inactive
OFA-Sys
RLHFlow
NVIDIA-NeMo
alibaba
inclusionAI