siiRL  by sii-research

Scalable distributed RL framework for advanced LLMs and multi-agent systems

Created 7 months ago
347 stars

Top 80.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

siiRL is a fully distributed reinforcement learning framework designed to overcome scaling limitations in LLM post-training and multi-agent systems. It targets researchers and engineers needing high-throughput, scalable RL solutions, offering near-linear scalability to thousands of GPUs and flexible workflow definition via Directed Acyclic Graphs (DAGs).

How It Works

siiRL employs a novel multi-controller paradigm, eliminating the centralized controller bottleneck found in other frameworks. Its architecture comprises a DAG Planner, DAG Workers (each bound to a single GPU), and a Data Coordinator with distributed dataloaders and databuffers. This fully distributed dataflow design minimizes communication overhead, enabling efficient data management and near-linear scalability across large GPU clusters. The DAG-defined pipeline decouples algorithmic logic from hardware, facilitating rapid experimentation.

Quick Start & Requirements

  • Installation: Details are available in the Documentation and Quickstart.
  • Prerequisites: Requires GPU hardware. Officially supports Huawei Ascend NPUs alongside GPUs. PyTorch, Ray, vLLM, vLLM-Ascend, and SGLang are foundational dependencies.
  • Resource Footprint: Designed for large-scale clusters, scaling up to 1024 GPUs.
  • Links: 📄 Paper, 📚 Documentation, Feishu Group, Wechat Group.

Highlighted Details

  • Achieves near-linear scalability up to 1024 GPUs with over 90% efficiency, outperforming baseline frameworks significantly in data-intensive scenarios (e.g., up to 2.62x throughput improvement with GRPO).
  • Supports training Vision-Language-Action (VLA) models with SRPO for embodied RL and integrates Megatron training backend with MoE support (validated on Qwen3-MoE).
  • Demonstrates robust performance on long-context tasks and large models (7B-72B), showing comparable model convergence to baselines while reducing training time.
  • Offers cross-hardware compatibility, including official support for Huawei Ascend NPUs.

Maintenance & Community

The project is under active development, with recent updates focusing on VLA training, multi-agent capabilities, and base framework enhancements. Community contributions are welcomed via the Contributing Guide.

Licensing & Compatibility

The provided README does not explicitly state the software license. This lack of clear licensing information may pose compatibility issues for commercial use or integration into closed-source projects.

Limitations & Caveats

The absence of a specified open-source license is a significant adoption blocker. While actively developed with promising features, the framework's maturity for all potential use cases, particularly advanced multi-agent systems and VLA training, is still evolving based on future plans.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
18 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

0.9%
3k
RL library for large language models
Created 9 months ago
Updated 1 day ago
Feedback? Help us improve.