MM-EUREKA by ModalMinds

Multimodal reasoning models using rule-based reinforcement learning

Created 10 months ago

766 stars

Top 45.6% on SourcePulse

Project Summary

MM-EUREKA provides a framework for multimodal reasoning, extending rule-based reinforcement learning to vision-language models. It targets researchers and developers aiming to improve performance on complex reasoning tasks, offering enhanced capabilities over previous multimodal models.

How It Works

MM-EUREKA builds upon OpenRLHF, integrating vision-language models (VLMs) and supporting advanced RL algorithms like GRPO, REINFORCE++, and RLOO. It features a hybrid training engine with vLLM integration for efficient distributed training and enhanced rule-based reward mechanisms. Key improvements include online filtering for experience quality, ADORA for adaptive rollout adjustment, and DAPO for improved loss functions. The architecture upgrades the base model to Qwen2.5-VL, freezes the Vision Transformer (ViT) module, and transitions to an online data filtering strategy.

Quick Start & Requirements

Installation: git clone https://github.com/ModalMinds/MM-EUREKA.git, git checkout qwen, cd MM-EUREKA, pip install -e .[vllm], pip install flash_attn --no-build-isolation.
Prerequisites: Python, vLLM, flash_attn.
Data: Requires downloading the MMK12 dataset. Custom data must be in JSONL format.
Training: Scripts are provided for single and multi-node training. Environment variables like $MASTER_ADDR and $NODE_RANK need configuration.
Links: Report, Models, Dataset, Code

Highlighted Details

MM-Eureka-Qwen-7B achieves 73.0 on MathVista (testmini), surpassing InternVL2.5-78B.
MM-Eureka-Qwen-32B scores 73.4 on WeMath, outperforming most open and closed-source models.
The MMK12 dataset, with 15k samples and 2k MCQs across K12 subjects, is open-sourced for evaluation.
The complete pipeline, including codes, models, and data, is available to foster research.

Maintenance & Community

The project is under active development with contributions welcomed via pull requests or issues. Community engagement is encouraged via a WeChat group. Key acknowledgements include contributions from OpenRLHF, LMM-R1, and vLLM.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. Detailed setup instructions for distributed training beyond environment variable configuration are not provided.

MM-EUREKA by ModalMinds

Explore Similar Projects

Awesome_Efficient_LRM_Reasoning by XiaoYee

awesome-deep-reasoning by modelscope

OneThinker by tulerfeng

POLARIS by ChenxinAn-fdu

Awesome-MLLM-Reasoning-Collection by lwpyh

lmm-r1 by TideDra

Awesome-RL-based-Reasoning-MLLMs by Sun-Haoyuan23

open-r1-multimodal by EvolvingLMMs-Lab

molmo by allenai

Skywork-R1V by SkyworkAI

simpleRL-reason by hkust-nlp

EasyR1 by hiyouga