RL2 by ChenmienTan

Reinforcement learning for large language models

Created 9 months ago

1,019 stars

Top 36.7% on SourcePulse

View on GitHub

6 Experts Love This Project

Alexander Wettig

Coauthor of SWE-bench, SWE-agent

Wing Lian

Founder of Axolotl AI

Will Brown

Research Lead at Prime Intellect

Jared Palmer

SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX

and 2 more!

Project Summary

RL2 is a reinforcement learning library for large language models, designed for researchers and practitioners who need a concise and efficient tool for experimenting with and deploying RL algorithms. It offers a production-ready framework with clear implementations, enabling users to scale to large models (e.g., 72B parameters) through advanced parallelism techniques and optimized inference.

How It Works

RL2 leverages Fully Sharded Data Parallelism (FSDP) and Tensor Parallelism (TP) for efficient model partitioning, allowing it to handle large language models. It incorporates sequence parallelism via ZigZag Ring Attention and KV cache partitioning through TP for enhanced inference throughput. The library also supports balanced sequence packing and multi-turn rollouts with an asynchronous inference engine, contributing to its production-readiness.

Quick Start & Requirements

Installation: Clone the repository and install using pip install -e ..
Data: Supports Hugging Face datasets and various file formats (JSON, JSONL, CSV, Parquet, Arrow). Specific formats are required for SFT, RM, DPO, and PPO training.
Rewards: Reward functions can be specified via a Python script, allowing integration with external reward models served by engines like vLLM or SGLang.
Tools: Supports multi-turn rollouts with function calling, requiring specific parsing of messages and tool interactions within the reward function script.
Training: Launched using torchrun for both single-node and multi-node distributed training.
Dependencies: Requires PyTorch. Specific hardware (e.g., GPUs) and CUDA versions are implied for large model training.

Highlighted Details

Supports model partitioning via ZeRO stage 3, DDP, and Tensor Parallelism.
Features efficient sequence parallelism with ZigZag Ring Attention.
Includes multi-turn rollout with function calling capabilities.
Offers Dr. GRPO as the default RL algorithm, with options for PPO and GRPO.

Maintenance & Community

The project is associated with Accio, an AI sourcing engine, and is actively seeking talent in agent and reinforcement learning. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The repository is hosted on GitHub, implying a permissive license, but the specific license type and compatibility for commercial use are not detailed in the provided README.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or unsupported platforms. The project's "production-ready" claim is supported by references to specific model benchmarks on Wandb, but detailed performance metrics or comparisons are not included.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

86 stars in the last 30 days