Multimodal training code for geometry problem solving
Top 87.4% on sourcepulse
This repository, MM-Eureka-V0 (also known as R1-Multimodal-Journey), addresses challenges in multimodal reasoning, particularly for complex tasks like geometry problems. It targets researchers and engineers working with multimodal large language models (VLM), aiming to improve their reasoning capabilities and training efficiency. The project offers a faster training process and explores reinforcement learning techniques for VLMs.
How It Works
MM-Eureka-V0 enhances training speed by integrating vLLM, achieving a 5-6x speedup over previous implementations. It explores reinforcement learning (RL) strategies, similar to R1, to improve performance on challenging geometry problems, using a subset of the geo170k dataset. The project notes that "aha moments" can emerge early in training, even with smaller models.
Quick Start & Requirements
pip install vllm==0.7.2 trl==0.15.0.dev0
local_scripts/gen_dataset.py
and run python local_scripts/gen_dataset.py
. Images are stored as paths, not PIL format, for vLLM compatibility.local_scripts/train_qwen2_5_3b.sh
and run sh local_scripts/train_qwen2_5_3b.sh
.python eval/evaluate_mathvista.py --checkpoint ${CHECKPOINT} --datasets MathVista_testmini
Highlighted Details
Maintenance & Community
Core contributors include Lingxiao Du, Xiangyan Liu, and Fanqing Meng. Project leaders are Wenqi Shao and Qiaosheng Zhang. Interns are being sought at Shanghai AI Lab.
Licensing & Compatibility
The README does not explicitly state the license. It mentions building upon Open-R1-Multimodal, vLLM, and trl, and gratitude towards DeepSeek-R1 and Qwen2.5-VL, suggesting potential compatibility with their licenses.
Limitations & Caveats
VLMs appear to struggle with length increase patterns and require high-quality, scarce multimodal reasoning data. The project notes that simple datasets can lead to overfitting. The default vLLM generation uses cuda:7
, potentially limiting training on systems with fewer GPUs.
1 month ago
1 day