AReaL  by inclusionAI

Distributed RL system for LLM reasoning

created 5 months ago
2,108 stars

Top 21.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

AReaL is a distributed reinforcement learning system designed for training Large Language Models (LLMs) to enhance their reasoning capabilities, particularly in areas like mathematics. It targets researchers and developers aiming to build custom AI agents efficiently and affordably, offering reproducible training details, datasets, and infrastructure.

How It Works

AReaL leverages a scalable, distributed reinforcement learning framework, building upon the RealHF project. It incorporates system-level optimizations, including SGLang support, to accelerate training. The system focuses on techniques like Proximal Policy Optimization (PPO) for fine-tuning LLMs with human feedback data, enabling state-of-the-art performance on reasoning benchmarks.

Quick Start & Requirements

  • Install/Run:
    • Train: python3 -m realhf.apps.quickstart ppo-math --config examples/configs/7B-distill/ppo-7B-distill-gpus-128.yaml
    • Evaluate: python evaluation/eval_and_aggregate.py --model_path ${MODEL_PATH} --output_path ${OUTPUT_PATH} --data_names aime24,aime25 --prompt_type AReaL-boba --output_path outputs --temperature 1.0
  • Prerequisites: Requires significant GPU resources (e.g., 128 GPUs for 7B model training). Specific Python versions and dependencies are detailed in the project's documentation.
  • Resources: Training times vary significantly with model size and GPU count, ranging from hours to days.
  • Links:

Highlighted Details

  • Achieves state-of-the-art (SOTA) performance on math reasoning benchmarks (AIME 2024/2025) for 7B models, improving scores by up to 8.6 points.
  • Demonstrates ability to replicate QwQ-32B performance on AIME 2024 with only 200 data samples via Supervised Fine-Tuning (SFT).
  • Offers up to 1.5x speedup in training for 7B models with SGLang support and system-level optimizations.
  • Provides reproducible training data and recipes for 1.5B, 7B, and 32B models.

Maintenance & Community

  • Developed by RL Lab, Ant Research and Institute for Interdisciplinary Information Sciences, Tsinghua University, with assistance from Ant Group's Super Computing Technology team.
  • Acknowledges contributions from projects like RealHF, DeepScaleR, OpenRLHF, and SGLang.
  • Active development with weekly releases planned.

Licensing & Compatibility

  • The project is fully open-sourced. Specific license details are not explicitly stated in the README, but the commitment to open-source suggests permissive licensing. Compatibility for commercial use would require verification of the specific license.

Limitations & Caveats

The project is under active development with a roadmap including future support for RL training with coding problems, asynchronous training, and RL for vision-language models. While it shows strong performance in math reasoning, its applicability to other domains is still under development.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
53
Issues (30d)
14
Star History
949 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

SWE-Gym by SWE-Gym

1.0%
513
Environment for training software engineering agents
created 9 months ago
updated 4 days ago
Feedback? Help us improve.