Efficient-R1-VLLM  by baibizhe

RL-tuned MoE vision-language model for reasoning tasks

created 5 months ago
324 stars

Top 85.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project introduces Efficient-R1-VLLM, a novel approach to fine-tuning Mixture-of-Experts (MoE) vision-language models (VLLMs) for enhanced multimodal reasoning. Targeting researchers and developers working with VLLMs, it offers improved reasoning capabilities and training efficiency by applying reinforcement learning.

How It Works

Efficient-R1-VLLM pioneers the application of Proximal Policy Optimization (PPO) based reinforcement learning, specifically GRPO, to fine-tune the DeepSeek2-VL MoE model. The core innovation lies in modifying the training pipeline to enforce image caption generation before the reasoning output. This strategy, validated by performance improvements on the Qwen-7B-Instruct model, aims to better integrate visual information into the model's reasoning process. Training efficiency is further boosted by leveraging SGLang for accelerated trajectory sampling.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: CUDA 12.x, NVIDIA GPUs. SGLang installation: pip install "sglang[all]"
  • Dependencies: Builds upon R1-Multimodal-Journey, vLLM, Open-R1, and trl.

Highlighted Details

  • First RL-tuned MoE Vision-Language Model using DeepSeek2-VL-MoE.
  • Achieves 1.7x faster trajectory sampling via SGLang integration.
  • Demonstrates significant performance gains by enforcing image captions in prompts and reward feedback.
  • Integrates an evaluation loop within the trl framework.

Maintenance & Community

The project acknowledges contributions from Bai Bizhe, Professor Wenqi Shao, and Qiaosheng Zhang. It builds upon and integrates open-source contributions from vLLM, Open-R1, and trl, and extends gratitude to DeepSeek-R1 and Qwen2.5-VL.

Licensing & Compatibility

The README does not explicitly state the license type or any compatibility notes for commercial use.

Limitations & Caveats

The project states that quick start code will be available soon, indicating it may still be under active development or not yet fully released for easy adoption.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.