RLinf by RLinf

Reinforcement learning infrastructure for agentic AI

Created 5 months ago

2,067 stars

Top 21.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiaming Song

Chief Scientist at Luma AI

Project Summary

RLinf is an open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) using reinforcement learning. It provides a flexible and scalable backbone for developing agentic AI, enabling open-ended learning and continuous generalization. The system is particularly beneficial for researchers and developers working on advanced AI training paradigms.

How It Works

RLinf introduces a novel "Macro-to-Micro Flow" (M2Flow) paradigm, which separates the logical workflow construction from physical communication and scheduling. This allows for programmable, high-level logical flows to be executed efficiently through micro-level operations. It supports flexible execution modes (Collocated, Disaggregated, Hybrid) and an automatic scheduling strategy that selects the optimal mode based on the training workload, eliminating the need for manual resource allocation.

Quick Start & Requirements

Installation: Details for installation are available in the README, with specific quickstart guides for PPO training of VLAs on Maniskill3 and GRPO training of LLMs on MATH.
Prerequisites: Supports FSDP + Hugging Face backends for rapid adaptation and Megatron + SGLang for large-scale training. Compatibility with mainstream CPU & GPU-based simulators like ManiSkill3 and LIBERO is provided.
Resources: Claims 120%+ throughput improvement with its hybrid mode and fine-grained pipelining. Automatic online scaling can improve efficiency by 20-40%.

Highlighted Details

Supports fast adaptation for VLA models like OpenVLA and π₀.
Enables RL fine-tuning of the π₀ model family with a flow-matching action expert.
Offers built-in support for popular RL methods including PPO, GRPO, DAPO, and Reinforce++.
Integrates LoRA for efficient fine-tuning and supports 5D Parallelism for Megatron-LM.

Maintenance & Community

RLinf is a new project, with its formal v0.1 release and accompanying paper expected soon. It acknowledges inspiration from projects like VeRL, AReaL, Megatron-LM, SGLang, and PyTorch FSDP. Contact information for inquiries and potential collaborators is provided.

Licensing & Compatibility

The README does not explicitly state the license type or compatibility for commercial use.

Limitations & Caveats

The project is in its early stages, with a formal v0.1 release and paper forthcoming. The roadmap indicates planned support for heterogeneous GPUs, asynchronous pipeline execution, Mixture of Experts (MoE), vLLM inference backend, and various VLM/VLA training extensions, suggesting these features are not yet available.

RLinf by RLinf

Explore Similar Projects

MegaDLMs by JinjieNi

arbor by Ziems

fms-fsdp by foundation-model-stack

LLaVA-OneVision-1.5 by EvolvingLMMs-Lab

mini_qwen by qiufengqijun

Megatron-Bridge by NVIDIA-NeMo

VLA-Adapter by OpenHelix-Team

Pai-Megatron-Patch by alibaba

Megatron-DeepSpeed by bigscience-workshop

EasyR1 by hiyouga

axolotl by axolotl-ai-cloud

pytorch-lightning by Lightning-AI