siiRL by sii-research

Scalable distributed RL framework for advanced LLMs and multi-agent systems

Created 7 months ago

347 stars

Top 80.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

siiRL is a fully distributed reinforcement learning framework designed to overcome scaling limitations in LLM post-training and multi-agent systems. It targets researchers and engineers needing high-throughput, scalable RL solutions, offering near-linear scalability to thousands of GPUs and flexible workflow definition via Directed Acyclic Graphs (DAGs).

How It Works

siiRL employs a novel multi-controller paradigm, eliminating the centralized controller bottleneck found in other frameworks. Its architecture comprises a DAG Planner, DAG Workers (each bound to a single GPU), and a Data Coordinator with distributed dataloaders and databuffers. This fully distributed dataflow design minimizes communication overhead, enabling efficient data management and near-linear scalability across large GPU clusters. The DAG-defined pipeline decouples algorithmic logic from hardware, facilitating rapid experimentation.

Quick Start & Requirements

Installation: Details are available in the Documentation and Quickstart.
Prerequisites: Requires GPU hardware. Officially supports Huawei Ascend NPUs alongside GPUs. PyTorch, Ray, vLLM, vLLM-Ascend, and SGLang are foundational dependencies.
Resource Footprint: Designed for large-scale clusters, scaling up to 1024 GPUs.
Links: 📄 Paper, 📚 Documentation, Feishu Group, Wechat Group.

Highlighted Details

Achieves near-linear scalability up to 1024 GPUs with over 90% efficiency, outperforming baseline frameworks significantly in data-intensive scenarios (e.g., up to 2.62x throughput improvement with GRPO).
Supports training Vision-Language-Action (VLA) models with SRPO for embodied RL and integrates Megatron training backend with MoE support (validated on Qwen3-MoE).
Demonstrates robust performance on long-context tasks and large models (7B-72B), showing comparable model convergence to baselines while reducing training time.
Offers cross-hardware compatibility, including official support for Huawei Ascend NPUs.

Maintenance & Community

The project is under active development, with recent updates focusing on VLA training, multi-agent capabilities, and base framework enhancements. Community contributions are welcomed via the Contributing Guide.

Licensing & Compatibility

The provided README does not explicitly state the software license. This lack of clear licensing information may pose compatibility issues for commercial use or integration into closed-source projects.

Limitations & Caveats

The absence of a specified open-source license is a significant adoption blocker. While actively developed with promising features, the framework's maturity for all potential use cases, particularly advanced multi-agent systems and VLA training, is still evolving based on future plans.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days