AReaL by inclusionAI

Distributed RL system for LLM reasoning

Created 10 months ago

3,374 stars

Top 14.3% on SourcePulse

View on GitHub

4 Experts Love This Project

Inference Lead at SGLang; Research Scientist at Together AI

Lianmin Zheng

Coauthor of SGLang, vLLM

Project Summary

AReaL is a distributed reinforcement learning system designed for training Large Language Models (LLMs) to enhance their reasoning capabilities, particularly in areas like mathematics. It targets researchers and developers aiming to build custom AI agents efficiently and affordably, offering reproducible training details, datasets, and infrastructure.

How It Works

AReaL leverages a scalable, distributed reinforcement learning framework, building upon the RealHF project. It incorporates system-level optimizations, including SGLang support, to accelerate training. The system focuses on techniques like Proximal Policy Optimization (PPO) for fine-tuning LLMs with human feedback data, enabling state-of-the-art performance on reasoning benchmarks.

Quick Start & Requirements

Install/Run:
- Train: python3 -m realhf.apps.quickstart ppo-math --config examples/configs/7B-distill/ppo-7B-distill-gpus-128.yaml
- Evaluate: python evaluation/eval_and_aggregate.py --model_path ${MODEL_PATH} --output_path ${OUTPUT_PATH} --data_names aime24,aime25 --prompt_type AReaL-boba --output_path outputs --temperature 1.0
Prerequisites: Requires significant GPU resources (e.g., 128 GPUs for 7B model training). Specific Python versions and dependencies are detailed in the project's documentation.
Resources: Training times vary significantly with model size and GPU count, ranging from hours to days.
Links:
- Documentation: https://github.com/inclusionAI/AReaL (DeepWiki link mentioned in news)
- Tutorials: https://github.com/inclusionAI/AReaL (Tutorial and Chinese Tutorial links)

Highlighted Details

Achieves state-of-the-art (SOTA) performance on math reasoning benchmarks (AIME 2024/2025) for 7B models, improving scores by up to 8.6 points.
Demonstrates ability to replicate QwQ-32B performance on AIME 2024 with only 200 data samples via Supervised Fine-Tuning (SFT).
Offers up to 1.5x speedup in training for 7B models with SGLang support and system-level optimizations.
Provides reproducible training data and recipes for 1.5B, 7B, and 32B models.

Maintenance & Community

Developed by RL Lab, Ant Research and Institute for Interdisciplinary Information Sciences, Tsinghua University, with assistance from Ant Group's Super Computing Technology team.
Acknowledges contributions from projects like RealHF, DeepScaleR, OpenRLHF, and SGLang.
Active development with weekly releases planned.

Licensing & Compatibility

The project is fully open-sourced. Specific license details are not explicitly stated in the README, but the commitment to open-source suggests permissive licensing. Compatibility for commercial use would require verification of the specific license.

Limitations & Caveats

The project is under active development with a roadmap including future support for RL training with coding problems, asynchronous training, and RL for vision-language models. While it shows strong performance in math reasoning, its applicability to other domains is still under development.

Health Check

Last Commit

17 hours ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

234 stars in the last 30 days