Discover and explore top open-source AI tools and projects—updated daily.
ServiceNowScalable RL for training LLM agents
Top 80.3% on SourcePulse
A scalable asynchronous reinforcement learning framework, PipelineRL, is designed for efficient LLM agent training. It addresses the classic trade-off between inference throughput and data freshness by implementing in-flight weight updates, enabling faster, more stable RL training and maximizing GPU utilization. The framework is targeted at researchers and engineers developing LLM-based agents.
How It Works
The core innovation is an asynchronous RL pipeline that performs weight updates "in-flight." After each optimizer step, updated model weights are broadcast to inference servers without pausing data sampling. This approach maintains near on-policy data while allowing for large batch sizes, thereby maximizing GPU utilization. PipelineRL employs a simplified GRPO algorithm, omitting value networks, trust-region clamping, and default KL/entropy bonuses for streamlined training, though KL support is available.
Quick Start & Requirements
Installation involves cloning the repository, creating a Python 3.11 conda environment, installing PyTorch 2.6.0, and then installing the package in editable mode (pip install -e . --no-build-isolation). For enhanced inter-process communication, a Redis server can be installed. Training is launched via python -m pipelinerl.launch with specified configuration names (e.g., guessing or base_4gpu) and output directories.
Highlighted Details
Maintenance & Community
The provided README does not detail community channels (e.g., Discord, Slack), specific contributors, sponsorships, or a public roadmap.
Licensing & Compatibility
The repository's license is not specified in the README. Consequently, its compatibility for commercial use or integration into closed-source projects remains undetermined.
Limitations & Caveats
The default file system-based data streaming can generate substantial disk usage; Redis is recommended for more robust or distributed setups. The simplified GRPO implementation omits certain advanced RL features by default, though they can be configured.
2 weeks ago
Inactive
NousResearch
laude-institute
Lightning-AI