ReinFlow by ReinFlow

Flow matching policy fine-tuning via online RL

Created 1 year ago

342 stars

Top 80.5% on SourcePulse

Project Summary

Summary

ReinFlow offers a flexible framework for fine-tuning flow matching policies using online reinforcement learning, specifically supporting Vision-Language-Action (VLA) models. It enables researchers and engineers to enhance pre-trained imitation learning policies with RL, improving performance on complex robotic tasks. The core benefit is efficient adaptation of flow-based models to downstream RL objectives.

How It Works

The key innovation is an end-to-end trained noise injection network, enabling tractable policy probabilities even with minimal denoising steps (1-4). ReinFlow first trains policies via imitation learning (behavior cloning) and then fine-tunes them with online RL. This approach is robust to discretization and Monte Carlo approximation errors inherent in few-step diffusion processes.

Quick Start & Requirements

Installation is detailed in installation/reinflow-setup.md, with experiment reproduction guides in ReproduceExps.md and ReproduceFigs.md. While specific dependencies like CUDA versions aren't listed, the project's scale (3B parameters, extensive robotics benchmarks) implies significant computational resources, likely high-end GPUs. Project website and arXiv paper (arXiv:2510.25889) are available.

Highlighted Details

Supports fine-tuning advanced VLA models like NVIDIA's GR00T, $\pi_0$, and $\pi_{0.5}$.
Achieves strong performance on legged locomotion (OpenAI Gym), state-based manipulation (Franka Kitchen), and visual manipulation (Robomimic).
End-to-end noise injection network ensures tractability with few denoising steps and robustness to approximation errors.
Compatible with 1-Rectified Flow, Shortcut Models, and other ODE-defined policies.
Full training metrics available via WandB; Robomimic rendering bugs fixed.
Scaled to 3 billion parameters with RLinf project support.

Maintenance & Community

Authored by Tonghe Zhang et al., with contributions from the RLinf project. Code, checkpoints, and documentation are fully released. Direct community channels (e.g., Discord, Slack) are not specified.

Licensing & Compatibility

Released under the permissive MIT license, allowing broad compatibility for commercial use and integration into closed-source projects.

Limitations & Caveats

ReinFlow is explicitly designed for fine-tuning existing RL agents, not for training them from scratch. It may not be optimal for initial pre-training.

ReinFlow by ReinFlow

Explore Similar Projects

Hybrid-VLA by PKU-HMI-Lab

tonic by fabiopardo

Relax by redai-infra

Robo-Dopamine by FlagOpen

Agent-R1 by AgentR1

open-pi-zero by allenzren

pfrl by pfnet

David-Silver-Reinforcement-learning by dalmia

chainerrl by chainer

Qwen3.6 by QwenLM

R1-V by StarsfieldAI

EasyR1 by hiyouga