RLinf  by RLinf

Reinforcement learning infrastructure for agentic AI

Created 1 month ago
388 stars

Top 73.9% on SourcePulse

GitHubView on GitHub
Project Summary

RLinf is an open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) using reinforcement learning. It provides a flexible and scalable backbone for developing agentic AI, enabling open-ended learning and continuous generalization. The system is particularly beneficial for researchers and developers working on advanced AI training paradigms.

How It Works

RLinf introduces a novel "Macro-to-Micro Flow" (M2Flow) paradigm, which separates the logical workflow construction from physical communication and scheduling. This allows for programmable, high-level logical flows to be executed efficiently through micro-level operations. It supports flexible execution modes (Collocated, Disaggregated, Hybrid) and an automatic scheduling strategy that selects the optimal mode based on the training workload, eliminating the need for manual resource allocation.

Quick Start & Requirements

  • Installation: Details for installation are available in the README, with specific quickstart guides for PPO training of VLAs on Maniskill3 and GRPO training of LLMs on MATH.
  • Prerequisites: Supports FSDP + Hugging Face backends for rapid adaptation and Megatron + SGLang for large-scale training. Compatibility with mainstream CPU & GPU-based simulators like ManiSkill3 and LIBERO is provided.
  • Resources: Claims 120%+ throughput improvement with its hybrid mode and fine-grained pipelining. Automatic online scaling can improve efficiency by 20-40%.

Highlighted Details

  • Supports fast adaptation for VLA models like OpenVLA and π₀.
  • Enables RL fine-tuning of the π₀ model family with a flow-matching action expert.
  • Offers built-in support for popular RL methods including PPO, GRPO, DAPO, and Reinforce++.
  • Integrates LoRA for efficient fine-tuning and supports 5D Parallelism for Megatron-LM.

Maintenance & Community

RLinf is a new project, with its formal v0.1 release and accompanying paper expected soon. It acknowledges inspiration from projects like VeRL, AReaL, Megatron-LM, SGLang, and PyTorch FSDP. Contact information for inquiries and potential collaborators is provided.

Licensing & Compatibility

The README does not explicitly state the license type or compatibility for commercial use.

Limitations & Caveats

The project is in its early stages, with a formal v0.1 release and paper forthcoming. The roadmap indicates planned support for heterogeneous GPUs, asynchronous pipeline execution, Mixture of Experts (MoE), vLLM inference backend, and various VLM/VLA training extensions, suggesting these features are not yet available.

Health Check
Last Commit

21 hours ago

Responsiveness

Inactive

Pull Requests (30d)
88
Issues (30d)
21
Star History
400 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
265
Efficiently train foundation models with PyTorch
Created 1 year ago
Updated 1 month ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI).

Pai-Megatron-Patch by alibaba

0.7%
1k
Training toolkit for LLMs & VLMs using Megatron
Created 2 years ago
Updated 1 day ago
Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

DialoGPT by microsoft

0.1%
2k
Response generation model via large-scale pretraining
Created 6 years ago
Updated 2 years ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
26 more.

axolotl by axolotl-ai-cloud

0.5%
10k
CLI tool for streamlined post-training of AI models
Created 2 years ago
Updated 20 hours ago
Feedback? Help us improve.