Discover and explore top open-source AI tools and projects—updated daily.
Decentralized RL training codebase for scaling
Top 53.9% on SourcePulse
This repository provides a framework for decentralized Reinforcement Learning (RL) training at scale, targeting researchers and engineers working with large language models (LLMs) and complex RL tasks. It enables distributed training and inference across multiple nodes and GPUs, aiming to simplify and accelerate the development of advanced AI models.
How It Works
The project leverages a decentralized architecture, allowing training and inference processes to run independently and communicate across a distributed system. It utilizes torchrun
for distributed training and vLLM
for efficient inference, supporting various parallelization strategies like Tensor Parallelism (TP), Pipeline Parallelism (PP), and Data Parallelism (DP). This approach is designed to maximize hardware utilization and scalability for large-scale RL experiments.
Quick Start & Requirements
curl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/prime-rl/main/install.sh | bash
uv
package manager. flash_attn
must be installed and functional.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
15 hours ago
1 day