Multimodal reasoning models using rule-based reinforcement learning
Top 49.0% on sourcepulse
MM-EUREKA provides a framework for multimodal reasoning, extending rule-based reinforcement learning to vision-language models. It targets researchers and developers aiming to improve performance on complex reasoning tasks, offering enhanced capabilities over previous multimodal models.
How It Works
MM-EUREKA builds upon OpenRLHF, integrating vision-language models (VLMs) and supporting advanced RL algorithms like GRPO, REINFORCE++, and RLOO. It features a hybrid training engine with vLLM integration for efficient distributed training and enhanced rule-based reward mechanisms. Key improvements include online filtering for experience quality, ADORA for adaptive rollout adjustment, and DAPO for improved loss functions. The architecture upgrades the base model to Qwen2.5-VL, freezes the Vision Transformer (ViT) module, and transitions to an online data filtering strategy.
Quick Start & Requirements
git clone https://github.com/ModalMinds/MM-EUREKA.git
, git checkout qwen
, cd MM-EUREKA
, pip install -e .[vllm]
, pip install flash_attn --no-build-isolation
.$MASTER_ADDR
and $NODE_RANK
need configuration.Highlighted Details
Maintenance & Community
The project is under active development with contributions welcomed via pull requests or issues. Community engagement is encouraged via a WeChat group. Key acknowledgements include contributions from OpenRLHF, LMM-R1, and vLLM.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not detailed.
Limitations & Caveats
The README does not specify the exact license, which may impact commercial use. Detailed setup instructions for distributed training beyond environment variable configuration are not provided.
1 week ago
1 day