open-r1-multimodal  by EvolvingLMMs-Lab

Multimodal training fork for open-r1

created 6 months ago
1,349 stars

Top 30.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a fork of open-r1 to enable multimodal model training, specifically focusing on Reinforcement Learning from Human Feedback (RLHF) for multimodal reasoning tasks. It targets researchers and developers interested in advancing multimodal AI capabilities, offering a framework and initial datasets for training and evaluating models like Qwen2-VL with GRPO.

How It Works

The project integrates multimodal capabilities into the open-r1 framework, leveraging the GRPO algorithm. It supports various Vision-Language Models (VLMs) available in the Hugging Face transformers library, including Qwen2-VL and Aria-MoE. The core innovation lies in its approach to multimodal RL training, exemplified by the creation of an 8k multimodal RL training dataset focused on math reasoning, generated with GPT-4o and including verifiable answers and reasoning paths.

Quick Start & Requirements

Highlighted Details

  • Implements multimodal R1 based on huggingface/open-r1 and deepseek-ai/DeepSeek-R1.
  • Integrates Qwen2-VL series, Aria-MoE, and other VLMs.
  • Open-sourced 8k multimodal RL training examples for math reasoning, generated by GPT4o.
  • Open-sourced GRPO-trained models: lmms-lab/Qwen2-VL-2B-GRPO-8k and lmms-lab/Qwen2-VL-7B-GRPO-8k.
  • Customizes verification logic for multiple-choice math problems.
  • Demonstrates improved performance in reasoning-based chain-of-thought (CoT) settings compared to base models.

Maintenance & Community

  • Community feedback is welcomed for improving understanding of multimodal reasoning models.
  • Plans to PR to open-r1 for better community support.
  • Discussions on dataset curation and scaling efficiency are ongoing.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README. The underlying open-r1 and transformers libraries have their own licenses (typically Apache 2.0 or MIT). The datasets and trained models are hosted on Hugging Face, implying their respective licenses apply.

Limitations & Caveats

  • Current framework is not efficient for large-scale training; 1 epoch for Qwen2-VL-2B takes 10 hours on 8 H100s.
  • Initial models may quickly optimize for reward format over accuracy.
  • Evaluation frameworks for visual reasoning tasks are limited in processing step-by-step reasoning traces.
  • Expanding RL datasets beyond math scenarios with verifiable answers requires further exploration.
Health Check
Last commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
119 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.