PipelineRL by ServiceNow

Scalable RL for training LLM agents

Created 9 months ago

345 stars

Top 80.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Lewis Tunstall

Research Engineer at Hugging Face

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

A scalable asynchronous reinforcement learning framework, PipelineRL, is designed for efficient LLM agent training. It addresses the classic trade-off between inference throughput and data freshness by implementing in-flight weight updates, enabling faster, more stable RL training and maximizing GPU utilization. The framework is targeted at researchers and engineers developing LLM-based agents.

How It Works

The core innovation is an asynchronous RL pipeline that performs weight updates "in-flight." After each optimizer step, updated model weights are broadcast to inference servers without pausing data sampling. This approach maintains near on-policy data while allowing for large batch sizes, thereby maximizing GPU utilization. PipelineRL employs a simplified GRPO algorithm, omitting value networks, trust-region clamping, and default KL/entropy bonuses for streamlined training, though KL support is available.

Quick Start & Requirements

Installation involves cloning the repository, creating a Python 3.11 conda environment, installing PyTorch 2.6.0, and then installing the package in editable mode (pip install -e . --no-build-isolation). For enhanced inter-process communication, a Redis server can be installed. Training is launched via python -m pipelinerl.launch with specified configuration names (e.g., guessing or base_4gpu) and output directories.

Highlighted Details

Achieves performance matching or exceeding Open-Reasoner-Zero on AIME-2024 and MATH-500 benchmarks using 7B and 32B models.
Maximizes GPU utilization by balancing high inference throughput with on-policy data freshness through continuous weight updates.
Features a modular, Hydra-configured architecture comprising Orchestrator, Inference Servers, Actor, Preprocessor, Trainer, and Verifier components.
Supports both file system (default) and Redis for data streaming between pipeline stages.

Maintenance & Community

The provided README does not detail community channels (e.g., Discord, Slack), specific contributors, sponsorships, or a public roadmap.

Licensing & Compatibility

The repository's license is not specified in the README. Consequently, its compatibility for commercial use or integration into closed-source projects remains undetermined.

Limitations & Caveats

The default file system-based data streaming can generate substantial disk usage; Redis is recommended for more robust or distributed setups. The simplified GRPO implementation omits certain advanced RL features by default, though they can be configured.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days