MM-EUREKA  by ModalMinds

Multimodal reasoning models using rule-based reinforcement learning

created 4 months ago
715 stars

Top 49.0% on sourcepulse

GitHubView on GitHub
Project Summary

MM-EUREKA provides a framework for multimodal reasoning, extending rule-based reinforcement learning to vision-language models. It targets researchers and developers aiming to improve performance on complex reasoning tasks, offering enhanced capabilities over previous multimodal models.

How It Works

MM-EUREKA builds upon OpenRLHF, integrating vision-language models (VLMs) and supporting advanced RL algorithms like GRPO, REINFORCE++, and RLOO. It features a hybrid training engine with vLLM integration for efficient distributed training and enhanced rule-based reward mechanisms. Key improvements include online filtering for experience quality, ADORA for adaptive rollout adjustment, and DAPO for improved loss functions. The architecture upgrades the base model to Qwen2.5-VL, freezes the Vision Transformer (ViT) module, and transitions to an online data filtering strategy.

Quick Start & Requirements

  • Installation: git clone https://github.com/ModalMinds/MM-EUREKA.git, git checkout qwen, cd MM-EUREKA, pip install -e .[vllm], pip install flash_attn --no-build-isolation.
  • Prerequisites: Python, vLLM, flash_attn.
  • Data: Requires downloading the MMK12 dataset. Custom data must be in JSONL format.
  • Training: Scripts are provided for single and multi-node training. Environment variables like $MASTER_ADDR and $NODE_RANK need configuration.
  • Links: Report, Models, Dataset, Code

Highlighted Details

  • MM-Eureka-Qwen-7B achieves 73.0 on MathVista (testmini), surpassing InternVL2.5-78B.
  • MM-Eureka-Qwen-32B scores 73.4 on WeMath, outperforming most open and closed-source models.
  • The MMK12 dataset, with 15k samples and 2k MCQs across K12 subjects, is open-sourced for evaluation.
  • The complete pipeline, including codes, models, and data, is available to foster research.

Maintenance & Community

The project is under active development with contributions welcomed via pull requests or issues. Community engagement is encouraged via a WeChat group. Key acknowledgements include contributions from OpenRLHF, LMM-R1, and vLLM.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. Detailed setup instructions for distributed training beyond environment variable configuration are not provided.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
6
Star History
138 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.