This repository provides a framework for decentralized Reinforcement Learning (RL) training at scale, targeting researchers and engineers working with large language models (LLMs) and complex RL tasks. It enables distributed training and inference across multiple nodes and GPUs, aiming to simplify and accelerate the development of advanced AI models.
How It Works
The project leverages a decentralized architecture, allowing training and inference processes to run independently and communicate across a distributed system. It utilizes torchrun
for distributed training and vLLM
for efficient inference, supporting various parallelization strategies like Tensor Parallelism (TP), Pipeline Parallelism (PP), and Data Parallelism (DP). This approach is designed to maximize hardware utilization and scalability for large-scale RL experiments.
Quick Start & Requirements
- Install:
curl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/prime-rl/main/install.sh | bash
- Environment: Requires Python 3.10+,
uv
package manager. flash_attn
must be installed and functional.
- Hardware: Primarily designed for multi-GPU setups (NVIDIA GPUs with CUDA). Specific examples demonstrate configurations for 2, 4, and 8 GPUs, including multi-node setups.
- Docs: [Not explicitly linked, but examples cover setup and distributed inference.]
Highlighted Details
- Supports distributed inference with TP, PP, and DP, including combinations like TP+PP and TP+DP.
- Provides detailed examples for running inference and training across various GPU configurations and multi-node setups.
- Includes a citation for the "INTELLECT-2" model, trained using this framework.
- Offers comprehensive test suites (unit, integration, GPU-specific, fast/slow tests).
Maintenance & Community
- The project is associated with the "Prime Intellect Team" and lists several authors in its citation.
- No explicit links to community channels (Discord, Slack) or a public roadmap are provided in the README.
Licensing & Compatibility
- The README does not explicitly state a license. The presence of a citation suggests academic use is intended. Commercial use implications are unclear without a specified license.
Limitations & Caveats
- The README does not specify a license, which may impact commercial adoption.
- Support for DP+PP configurations is explicitly stated as not supported.
- The project appears to be in active development, with examples demonstrating specific model and dataset configurations.