One-RL-to-See-Them-All by MiniMax-AI

VLM reinforcement learning framework

created 2 months ago

305 stars

Top 88.8% on sourcepulse

Project Summary

This repository provides V-Triune, a unified Reinforcement Learning (RL) system for advancing Vision-Language Models (VLMs). It enables VLMs to jointly master visual reasoning and perception tasks within a single training pipeline, offering significant performance gains on benchmarks like MEGA-Bench Core. The system is designed for researchers and engineers working on multimodal AI and VLM development.

How It Works

V-Triune unifies diverse VLM tasks through three core components: sample-level data formatting, verifier-level reward computation, and source-level metric monitoring. It introduces a novel Dynamic IoU reward mechanism for improved stability and performance on perception tasks. This unified RL approach allows a single framework to handle both reasoning (e.g., math, puzzles) and perception (e.g., detection, grounding) tasks simultaneously.

Quick Start & Requirements

Installation: Clone the repository and install via pip install -e ..
Prerequisites: Python 3.12, PyTorch 2.6.0 with CUDA 12.4, FlashAttention 2.7.3, and ninja. Docker is also supported.
Data: Download the Orsta-Data-47k dataset using huggingface-cli.
Distributed Training: Requires a Ray cluster setup.
Reward Server: A separate reward server must be launched using scripts/reward_server.sh.
Configuration: Environment variables for Ray cluster, reward server Job ID, data paths, model loading, and training parameters are required.
Resources: Training requires significant GPU resources (e.g., 8 GPUs per node recommended) and substantial disk space for the dataset.
Docs: Getting Started Guide

Highlighted Details

Achieves up to +14.1% on MEGA-Bench Core across 8 diverse tasks (4 reasoning, 4 perception).
Supports Orsta models ranging from 7B to 32B parameters.
Features a novel Dynamic IoU reward for adaptive, progressive feedback.
Provides public access to the V-Triune system and Orsta model weights.

Maintenance & Community

The project is actively developed by MiniMax AI. Updates and releases are announced via the repository. Further community engagement channels are not explicitly listed.

Licensing & Compatibility

The repository and associated models are publicly available, encouraging research and development. Specific licensing details for commercial use or redistribution are not detailed in the README.

Limitations & Caveats

The setup requires a distributed Ray cluster and a separate reward server, adding complexity to deployment. The project relies on specific versions of PyTorch and FlashAttention, which may require careful environment management. Training configuration involves numerous environment variables that must be correctly set.

One-RL-to-See-Them-All by MiniMax-AI

Explore Similar Projects

SFTvsRL by LeslieTrue

VisionThink by dvlab-research

MM-EUREKA by ModalMinds

KataCR by wty-yy

molmo by allenai

Online-RLHF by RLHFlow

pytorch-rl by navneet-nmk

mini-AlphaStar by liuruoze

finetrainers by huggingface

EasyR1 by hiyouga

simpleRL-reason by hkust-nlp

openpi by Physical-Intelligence