ROLL  by alibaba

RL library for large language models

Created 3 months ago
1,913 stars

Top 22.8% on SourcePulse

GitHubView on GitHub
Project Summary

ROLL is an open-source library designed to scale Reinforcement Learning (RL) for Large Language Models (LLMs) using distributed GPU resources. It targets AI labs, hyperscalers, and product developers aiming to enhance LLM capabilities in areas like human preference alignment, complex reasoning, and agentic interactions, offering significant speedups and cost reductions.

How It Works

ROLL employs a multi-role distributed architecture, leveraging Ray for flexible resource allocation and heterogeneous task scheduling. It integrates with high-performance backends such as Megatron-Core, SGLang, and vLLM to accelerate training and inference. The library emphasizes efficient data handling, including sample filtering based on difficulty and length, and provides advanced techniques for stabilizing training, such as value/advantage clipping and reward normalization.

Quick Start & Requirements

  • Installation: pip install -e . (from source)
  • Prerequisites: Python 3.10+, PyTorch, Ray, Megatron-Core, SGLang, vLLM. Specific hardware requirements depend on the LLM size and scale of training, but large-scale GPU clusters are implied for optimal use.
  • Resources: Setup time and resource footprint are highly variable based on the scale of LLM and GPU cluster.
  • Documentation: Quick Start, Installation, RLVR Pipeline, Agentic RL Pipeline.

Highlighted Details

  • Supports LLMs up to 200B+ parameters across thousands of GPUs with fault tolerance.
  • Offers flexible hardware usage with colocation/disaggregation and sync/async execution modes.
  • Features a compositional sample-reward route for dynamic task routing and custom reward/environment workers.
  • Includes advanced RL tuning techniques like dual clip loss, advantage whitening, and token-level KL regularization.

Maintenance & Community

Developed by Alibaba TAOBAO & TMALL Group and Alibaba Group. The project actively posts updates and has a tech report available. Community contributions are welcomed.

Licensing & Compatibility

Licensed under the Apache License (Version 2.0). The project utilizes third-party components under other open-source licenses, as detailed in the NOTICE file. Compatible with commercial use.

Limitations & Caveats

The project is actively under development with upcoming features like Qwen2.5 VL RL pipeline and FSDP2 integration. While it supports single-GPU setups, its primary design focus is on large-scale GPU clusters.

Health Check
Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
12
Issues (30d)
16
Star History
184 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

simpleRL-reason by hkust-nlp

0.1%
4k
RL recipe for reasoning ability in models
Created 7 months ago
Updated 1 month ago
Feedback? Help us improve.