Discover and explore top open-source AI tools and projects—updated daily.
baibizheRL-tuned MoE vision-language model for reasoning tasks
Top 84.5% on SourcePulse
This project introduces Efficient-R1-VLLM, a novel approach to fine-tuning Mixture-of-Experts (MoE) vision-language models (VLLMs) for enhanced multimodal reasoning. Targeting researchers and developers working with VLLMs, it offers improved reasoning capabilities and training efficiency by applying reinforcement learning.
How It Works
Efficient-R1-VLLM pioneers the application of Proximal Policy Optimization (PPO) based reinforcement learning, specifically GRPO, to fine-tune the DeepSeek2-VL MoE model. The core innovation lies in modifying the training pipeline to enforce image caption generation before the reasoning output. This strategy, validated by performance improvements on the Qwen-7B-Instruct model, aims to better integrate visual information into the model's reasoning process. Training efficiency is further boosted by leveraging SGLang for accelerated trajectory sampling.
Quick Start & Requirements
pip install -r requirements.txtpip install "sglang[all]"Highlighted Details
Maintenance & Community
The project acknowledges contributions from Bai Bizhe, Professor Wenqi Shao, and Qiaosheng Zhang. It builds upon and integrates open-source contributions from vLLM, Open-R1, and trl, and extends gratitude to DeepSeek-R1 and Qwen2.5-VL.
Licensing & Compatibility
The README does not explicitly state the license type or any compatibility notes for commercial use.
Limitations & Caveats
The project states that quick start code will be available soon, indicating it may still be under active development or not yet fully released for easy adoption.
10 months ago
Inactive
zengyan-97
Aleph-Alpha-Research
hiyouga