ROLL by alibaba

RL library for large language models

Created 7 months ago

2,612 stars

Top 17.8% on SourcePulse

View on GitHub

5 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Wing Lian

Founder of Axolotl AI

Yiran Wu

Coauthor of AutoGen

Philipp Moritz

Cofounder of Anyscale

and 1 more!

Project Summary

ROLL is an open-source library designed to scale Reinforcement Learning (RL) for Large Language Models (LLMs) using distributed GPU resources. It targets AI labs, hyperscalers, and product developers aiming to enhance LLM capabilities in areas like human preference alignment, complex reasoning, and agentic interactions, offering significant speedups and cost reductions.

How It Works

ROLL employs a multi-role distributed architecture, leveraging Ray for flexible resource allocation and heterogeneous task scheduling. It integrates with high-performance backends such as Megatron-Core, SGLang, and vLLM to accelerate training and inference. The library emphasizes efficient data handling, including sample filtering based on difficulty and length, and provides advanced techniques for stabilizing training, such as value/advantage clipping and reward normalization.

Quick Start & Requirements

Installation: pip install -e . (from source)
Prerequisites: Python 3.10+, PyTorch, Ray, Megatron-Core, SGLang, vLLM. Specific hardware requirements depend on the LLM size and scale of training, but large-scale GPU clusters are implied for optimal use.
Resources: Setup time and resource footprint are highly variable based on the scale of LLM and GPU cluster.
Documentation: Quick Start, Installation, RLVR Pipeline, Agentic RL Pipeline.

Highlighted Details

Supports LLMs up to 200B+ parameters across thousands of GPUs with fault tolerance.
Offers flexible hardware usage with colocation/disaggregation and sync/async execution modes.
Features a compositional sample-reward route for dynamic task routing and custom reward/environment workers.
Includes advanced RL tuning techniques like dual clip loss, advantage whitening, and token-level KL regularization.

Maintenance & Community

Developed by Alibaba TAOBAO & TMALL Group and Alibaba Group. The project actively posts updates and has a tech report available. Community contributions are welcomed.

Licensing & Compatibility

Licensed under the Apache License (Version 2.0). The project utilizes third-party components under other open-source licenses, as detailed in the NOTICE file. Compatible with commercial use.

Limitations & Caveats

The project is actively under development with upcoming features like Qwen2.5 VL RL pipeline and FSDP2 integration. While it supports single-GPU setups, its primary design focus is on large-scale GPU clusters.

Health Check

Last Commit

16 hours ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

171 stars in the last 30 days