Multimodal benchmark for expert AGI evaluation
Top 65.7% on sourcepulse
MMMU is a benchmark suite designed to evaluate multimodal large language models (MLLMs) on college-level subject knowledge and complex reasoning across diverse disciplines. It targets researchers and developers building expert-level Artificial General Intelligence (AGI) systems, offering a rigorous assessment of advanced perception and reasoning capabilities beyond existing benchmarks.
How It Works
MMMU comprises 11.5K multimodal questions from 30 subjects across six disciplines, featuring 32 heterogeneous image types. MMMU-Pro enhances this by filtering text-only questions, augmenting options with plausible distractors, and using a vision-only input setting to force simultaneous visual and textual comprehension. This approach aims to simulate expert-level cognitive tasks and provide a more robust evaluation of intrinsic multimodal understanding.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The test set answers and explanations are withheld, requiring submission to EvalAI for evaluation. The specific license for the evaluation code and datasets is not clearly stated in the README, which may impact commercial use or integration into closed-source projects.
2 months ago
1 day