Discover and explore top open-source AI tools and projects—updated daily.
ServiceNowScalable RL for training LLM agents
Top 94.9% on SourcePulse
A scalable asynchronous reinforcement learning framework, PipelineRL, is designed for efficient LLM agent training. It addresses the classic trade-off between inference throughput and data freshness by implementing in-flight weight updates, enabling faster, more stable RL training and maximizing GPU utilization. The framework is targeted at researchers and engineers developing LLM-based agents.
How It Works
The core innovation is an asynchronous RL pipeline that performs weight updates "in-flight." After each optimizer step, updated model weights are broadcast to inference servers without pausing data sampling. This approach maintains near on-policy data while allowing for large batch sizes, thereby maximizing GPU utilization. PipelineRL employs a simplified GRPO algorithm, omitting value networks, trust-region clamping, and default KL/entropy bonuses for streamlined training, though KL support is available.
Quick Start & Requirements
Installation involves cloning the repository, creating a Python 3.11 conda environment, installing PyTorch 2.6.0, and then installing the package in editable mode (pip install -e . --no-build-isolation). For enhanced inter-process communication, a Redis server can be installed. Training is launched via python -m pipelinerl.launch with specified configuration names (e.g., guessing or base_4gpu) and output directories.
Highlighted Details
Maintenance & Community
The provided README does not detail community channels (e.g., Discord, Slack), specific contributors, sponsorships, or a public roadmap.
Licensing & Compatibility
The repository's license is not specified in the README. Consequently, its compatibility for commercial use or integration into closed-source projects remains undetermined.
Limitations & Caveats
The default file system-based data streaming can generate substantial disk usage; Redis is recommended for more robust or distributed setups. The simplified GRPO implementation omits certain advanced RL features by default, though they can be configured.
23 hours ago
Inactive
NousResearch
Lightning-AI