Discover and explore top open-source AI tools and projects—updated daily.
EvolvingLMMs-LabMultimodal training fork for open-r1
Top 28.7% on SourcePulse
This repository provides a fork of open-r1 to enable multimodal model training, specifically focusing on Reinforcement Learning from Human Feedback (RLHF) for multimodal reasoning tasks. It targets researchers and developers interested in advancing multimodal AI capabilities, offering a framework and initial datasets for training and evaluating models like Qwen2-VL with GRPO.
How It Works
The project integrates multimodal capabilities into the open-r1 framework, leveraging the GRPO algorithm. It supports various Vision-Language Models (VLMs) available in the Hugging Face transformers library, including Qwen2-VL and Aria-MoE. The core innovation lies in its approach to multimodal RL training, exemplified by the creation of an 8k multimodal RL training dataset focused on math reasoning, generated with GPT-4o and including verifiable answers and reasoning paths.
Quick Start & Requirements
pip3 install vllm==0.6.6.post1, pip3 install -e ".[dev]", pip3 install wandb==0.18.3.torchrun --nproc_per_node=8 ... src/open_r1/grpo.py ...wandb for logging.Highlighted Details
huggingface/open-r1 and deepseek-ai/DeepSeek-R1.lmms-lab/Qwen2-VL-2B-GRPO-8k and lmms-lab/Qwen2-VL-7B-GRPO-8k.Maintenance & Community
open-r1 for better community support.Licensing & Compatibility
open-r1 and transformers libraries have their own licenses (typically Apache 2.0 or MIT). The datasets and trained models are hosted on Hugging Face, implying their respective licenses apply.Limitations & Caveats
8 months ago
Inactive
hiyouga
huggingface