Efficient-R1-VLLM by baibizhe

RL-tuned MoE vision-language model for reasoning tasks

Created 10 months ago

322 stars

Top 84.5% on SourcePulse

Project Summary

This project introduces Efficient-R1-VLLM, a novel approach to fine-tuning Mixture-of-Experts (MoE) vision-language models (VLLMs) for enhanced multimodal reasoning. Targeting researchers and developers working with VLLMs, it offers improved reasoning capabilities and training efficiency by applying reinforcement learning.

How It Works

Efficient-R1-VLLM pioneers the application of Proximal Policy Optimization (PPO) based reinforcement learning, specifically GRPO, to fine-tune the DeepSeek2-VL MoE model. The core innovation lies in modifying the training pipeline to enforce image caption generation before the reasoning output. This strategy, validated by performance improvements on the Qwen-7B-Instruct model, aims to better integrate visual information into the model's reasoning process. Training efficiency is further boosted by leveraging SGLang for accelerated trajectory sampling.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: CUDA 12.x, NVIDIA GPUs. SGLang installation: pip install "sglang[all]"
Dependencies: Builds upon R1-Multimodal-Journey, vLLM, Open-R1, and trl.

Highlighted Details

First RL-tuned MoE Vision-Language Model using DeepSeek2-VL-MoE.
Achieves 1.7x faster trajectory sampling via SGLang integration.
Demonstrates significant performance gains by enforcing image captions in prompts and reward feedback.
Integrates an evaluation loop within the trl framework.

Maintenance & Community

The project acknowledges contributions from Bai Bizhe, Professor Wenqi Shao, and Qiaosheng Zhang. It builds upon and integrates open-source contributions from vLLM, Open-R1, and trl, and extends gratitude to DeepSeek-R1 and Qwen2.5-VL.

Licensing & Compatibility

The README does not explicitly state the license type or any compatibility notes for commercial use.

Limitations & Caveats

The project states that quick start code will be available soon, indicating it may still be under active development or not yet fully released for easy adoption.

Efficient-R1-VLLM by baibizhe

Explore Similar Projects

cobra by h-zhao1997

VisionZip by JIA-Lab-research

LLaVA-RLHF by llava-rlhf

RoboFlamingo by RoboFlamingo

BakLLaVA by SkunkworksAI

X-VLM by zengyan-97

lmm-r1 by TideDra

magma by Aleph-Alpha-Research

Visual-RFT by Liuziyu77

Vary by Ucas-HaoranWei

EasyR1 by hiyouga

VLM-R1 by om-ai-lab