Multimodal training fork for open-r1
Top 30.4% on sourcepulse
This repository provides a fork of open-r1
to enable multimodal model training, specifically focusing on Reinforcement Learning from Human Feedback (RLHF) for multimodal reasoning tasks. It targets researchers and developers interested in advancing multimodal AI capabilities, offering a framework and initial datasets for training and evaluating models like Qwen2-VL with GRPO.
How It Works
The project integrates multimodal capabilities into the open-r1
framework, leveraging the GRPO algorithm. It supports various Vision-Language Models (VLMs) available in the Hugging Face transformers
library, including Qwen2-VL and Aria-MoE. The core innovation lies in its approach to multimodal RL training, exemplified by the creation of an 8k multimodal RL training dataset focused on math reasoning, generated with GPT-4o and including verifiable answers and reasoning paths.
Quick Start & Requirements
pip3 install vllm==0.6.6.post1
, pip3 install -e ".[dev]"
, pip3 install wandb==0.18.3
.torchrun --nproc_per_node=8 ... src/open_r1/grpo.py ...
wandb
for logging.Highlighted Details
huggingface/open-r1
and deepseek-ai/DeepSeek-R1
.lmms-lab/Qwen2-VL-2B-GRPO-8k
and lmms-lab/Qwen2-VL-7B-GRPO-8k
.Maintenance & Community
open-r1
for better community support.Licensing & Compatibility
open-r1
and transformers
libraries have their own licenses (typically Apache 2.0 or MIT). The datasets and trained models are hosted on Hugging Face, implying their respective licenses apply.Limitations & Caveats
5 months ago
1 week