RL2  by ChenmienTan

Reinforcement learning for large language models

Created 5 months ago
865 stars

Top 41.5% on SourcePulse

GitHubView on GitHub
Project Summary

RL2 is a reinforcement learning library for large language models, designed for researchers and practitioners who need a concise and efficient tool for experimenting with and deploying RL algorithms. It offers a production-ready framework with clear implementations, enabling users to scale to large models (e.g., 72B parameters) through advanced parallelism techniques and optimized inference.

How It Works

RL2 leverages Fully Sharded Data Parallelism (FSDP) and Tensor Parallelism (TP) for efficient model partitioning, allowing it to handle large language models. It incorporates sequence parallelism via ZigZag Ring Attention and KV cache partitioning through TP for enhanced inference throughput. The library also supports balanced sequence packing and multi-turn rollouts with an asynchronous inference engine, contributing to its production-readiness.

Quick Start & Requirements

  • Installation: Clone the repository and install using pip install -e ..
  • Data: Supports Hugging Face datasets and various file formats (JSON, JSONL, CSV, Parquet, Arrow). Specific formats are required for SFT, RM, DPO, and PPO training.
  • Rewards: Reward functions can be specified via a Python script, allowing integration with external reward models served by engines like vLLM or SGLang.
  • Tools: Supports multi-turn rollouts with function calling, requiring specific parsing of messages and tool interactions within the reward function script.
  • Training: Launched using torchrun for both single-node and multi-node distributed training.
  • Dependencies: Requires PyTorch. Specific hardware (e.g., GPUs) and CUDA versions are implied for large model training.

Highlighted Details

  • Supports model partitioning via ZeRO stage 3, DDP, and Tensor Parallelism.
  • Features efficient sequence parallelism with ZigZag Ring Attention.
  • Includes multi-turn rollout with function calling capabilities.
  • Offers Dr. GRPO as the default RL algorithm, with options for PPO and GRPO.

Maintenance & Community

The project is associated with Accio, an AI sourcing engine, and is actively seeking talent in agent and reinforcement learning. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The repository is hosted on GitHub, implying a permissive license, but the specific license type and compatibility for commercial use are not detailed in the provided README.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or unsupported platforms. The project's "production-ready" claim is supported by references to specific model benchmarks on Wandb, but detailed performance metrics or comparisons are not included.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
3
Star History
109 stars in the last 30 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Coauthor of SGLang) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

0.4%
455
CLI tool for LLM latency/memory analysis during training/inference
Created 2 years ago
Updated 5 months ago
Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
20 more.

alpa by alpa-projects

0.0%
3k
Auto-parallelization framework for large-scale neural network training and serving
Created 4 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 15 hours ago
Feedback? Help us improve.