siiRL  by sii-research

Scalable distributed RL framework for advanced LLMs and multi-agent systems

Created 4 months ago
276 stars

Top 93.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

siiRL is a fully distributed reinforcement learning framework designed to overcome scaling limitations in LLM post-training and multi-agent systems. It targets researchers and engineers needing high-throughput, scalable RL solutions, offering near-linear scalability to thousands of GPUs and flexible workflow definition via Directed Acyclic Graphs (DAGs).

How It Works

siiRL employs a novel multi-controller paradigm, eliminating the centralized controller bottleneck found in other frameworks. Its architecture comprises a DAG Planner, DAG Workers (each bound to a single GPU), and a Data Coordinator with distributed dataloaders and databuffers. This fully distributed dataflow design minimizes communication overhead, enabling efficient data management and near-linear scalability across large GPU clusters. The DAG-defined pipeline decouples algorithmic logic from hardware, facilitating rapid experimentation.

Quick Start & Requirements

  • Installation: Details are available in the Documentation and Quickstart.
  • Prerequisites: Requires GPU hardware. Officially supports Huawei Ascend NPUs alongside GPUs. PyTorch, Ray, vLLM, vLLM-Ascend, and SGLang are foundational dependencies.
  • Resource Footprint: Designed for large-scale clusters, scaling up to 1024 GPUs.
  • Links: 📄 Paper, 📚 Documentation, Feishu Group, Wechat Group.

Highlighted Details

  • Achieves near-linear scalability up to 1024 GPUs with over 90% efficiency, outperforming baseline frameworks significantly in data-intensive scenarios (e.g., up to 2.62x throughput improvement with GRPO).
  • Supports training Vision-Language-Action (VLA) models with SRPO for embodied RL and integrates Megatron training backend with MoE support (validated on Qwen3-MoE).
  • Demonstrates robust performance on long-context tasks and large models (7B-72B), showing comparable model convergence to baselines while reducing training time.
  • Offers cross-hardware compatibility, including official support for Huawei Ascend NPUs.

Maintenance & Community

The project is under active development, with recent updates focusing on VLA training, multi-agent capabilities, and base framework enhancements. Community contributions are welcomed via the Contributing Guide.

Licensing & Compatibility

The provided README does not explicitly state the software license. This lack of clear licensing information may pose compatibility issues for commercial use or integration into closed-source projects.

Limitations & Caveats

The absence of a specified open-source license is a significant adoption blocker. While actively developed with promising features, the framework's maturity for all potential use cases, particularly advanced multi-agent systems and VLA training, is still evolving based on future plans.

Health Check
Last Commit

1 hour ago

Responsiveness

Inactive

Pull Requests (30d)
7
Issues (30d)
1
Star History
56 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.