MM-EUREKA  by ModalMinds

Multimodal reasoning models using rule-based reinforcement learning

Created 6 months ago
734 stars

Top 47.2% on SourcePulse

GitHubView on GitHub
Project Summary

MM-EUREKA provides a framework for multimodal reasoning, extending rule-based reinforcement learning to vision-language models. It targets researchers and developers aiming to improve performance on complex reasoning tasks, offering enhanced capabilities over previous multimodal models.

How It Works

MM-EUREKA builds upon OpenRLHF, integrating vision-language models (VLMs) and supporting advanced RL algorithms like GRPO, REINFORCE++, and RLOO. It features a hybrid training engine with vLLM integration for efficient distributed training and enhanced rule-based reward mechanisms. Key improvements include online filtering for experience quality, ADORA for adaptive rollout adjustment, and DAPO for improved loss functions. The architecture upgrades the base model to Qwen2.5-VL, freezes the Vision Transformer (ViT) module, and transitions to an online data filtering strategy.

Quick Start & Requirements

  • Installation: git clone https://github.com/ModalMinds/MM-EUREKA.git, git checkout qwen, cd MM-EUREKA, pip install -e .[vllm], pip install flash_attn --no-build-isolation.
  • Prerequisites: Python, vLLM, flash_attn.
  • Data: Requires downloading the MMK12 dataset. Custom data must be in JSONL format.
  • Training: Scripts are provided for single and multi-node training. Environment variables like $MASTER_ADDR and $NODE_RANK need configuration.
  • Links: Report, Models, Dataset, Code

Highlighted Details

  • MM-Eureka-Qwen-7B achieves 73.0 on MathVista (testmini), surpassing InternVL2.5-78B.
  • MM-Eureka-Qwen-32B scores 73.4 on WeMath, outperforming most open and closed-source models.
  • The MMK12 dataset, with 15k samples and 2k MCQs across K12 subjects, is open-sourced for evaluation.
  • The complete pipeline, including codes, models, and data, is available to foster research.

Maintenance & Community

The project is under active development with contributions welcomed via pull requests or issues. Community engagement is encouraged via a WeChat group. Key acknowledgements include contributions from OpenRLHF, LMM-R1, and vLLM.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. Detailed setup instructions for distributed training beyond environment variable configuration are not provided.

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
5
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

simpleRL-reason by hkust-nlp

0.1%
4k
RL recipe for reasoning ability in models
Created 7 months ago
Updated 1 month ago
Feedback? Help us improve.