PipelineRL  by ServiceNow

Scalable RL for training LLM agents

Created 6 months ago
271 stars

Top 94.9% on SourcePulse

GitHubView on GitHub
Project Summary

A scalable asynchronous reinforcement learning framework, PipelineRL, is designed for efficient LLM agent training. It addresses the classic trade-off between inference throughput and data freshness by implementing in-flight weight updates, enabling faster, more stable RL training and maximizing GPU utilization. The framework is targeted at researchers and engineers developing LLM-based agents.

How It Works

The core innovation is an asynchronous RL pipeline that performs weight updates "in-flight." After each optimizer step, updated model weights are broadcast to inference servers without pausing data sampling. This approach maintains near on-policy data while allowing for large batch sizes, thereby maximizing GPU utilization. PipelineRL employs a simplified GRPO algorithm, omitting value networks, trust-region clamping, and default KL/entropy bonuses for streamlined training, though KL support is available.

Quick Start & Requirements

Installation involves cloning the repository, creating a Python 3.11 conda environment, installing PyTorch 2.6.0, and then installing the package in editable mode (pip install -e . --no-build-isolation). For enhanced inter-process communication, a Redis server can be installed. Training is launched via python -m pipelinerl.launch with specified configuration names (e.g., guessing or base_4gpu) and output directories.

Highlighted Details

  • Achieves performance matching or exceeding Open-Reasoner-Zero on AIME-2024 and MATH-500 benchmarks using 7B and 32B models.
  • Maximizes GPU utilization by balancing high inference throughput with on-policy data freshness through continuous weight updates.
  • Features a modular, Hydra-configured architecture comprising Orchestrator, Inference Servers, Actor, Preprocessor, Trainer, and Verifier components.
  • Supports both file system (default) and Redis for data streaming between pipeline stages.

Maintenance & Community

The provided README does not detail community channels (e.g., Discord, Slack), specific contributors, sponsorships, or a public roadmap.

Licensing & Compatibility

The repository's license is not specified in the README. Consequently, its compatibility for commercial use or integration into closed-source projects remains undetermined.

Limitations & Caveats

The default file system-based data streaming can generate substantial disk usage; Redis is recommended for more robust or distributed setups. The simplified GRPO implementation omits certain advanced RL features by default, though they can be configured.

Health Check
Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)
10
Issues (30d)
1
Star History
96 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

1.2%
4k
AI inference pipeline framework
Created 1 year ago
Updated 15 hours ago
Feedback? Help us improve.