MMMU
by CUHK / Waterloo · free · Last verified 2026-03-01
MMMU is a challenging multimodal benchmark designed to evaluate large models on expert-level tasks. It contains over 11,500 college-level problems spanning six core disciplines, requiring models to integrate deep subject knowledge with visual perception to answer multiple-choice questions with detailed reasoning.
https://mmmu-benchmark.github.io ↗B
B—Above Average
Adoption: B+Quality: A+Freshness: ACitations: B+Engagement: F
Specifications
- License
- Apache-2.0
- Pricing
- free
- Capabilities
- evaluating expert-level multimodal reasoning, assessing visual question answering in specialized domains, benchmarking large multimodal models (LMMs), testing knowledge across humanities, sciences, and engineering, measuring few-shot learning on complex problems, analyzing model performance on problems requiring chain-of-thought reasoning, providing a standardized test for college-level AI capabilities
- Integrations
- Use Cases
- [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro
- Metrics
- accuracy, per-discipline-accuracy
- Methodology
- College-level multiple-choice and open-ended questions with image inputs across 30 subjects. Tests both visual understanding and domain knowledge.
- Last Run
- 2026-03-01
- Tags
- benchmark, evaluation, multimodal, reasoning, expert-level, lmm-evaluation, visual-question-answering, vqa, college-level, science-reasoning, chain-of-thought
- Added
- 2026-03-17
- Completeness
- 0.9%
Index Score
66.9Adoption
76
Quality
90
Freshness
88
Citations
74
Engagement
0