BenchmarkComputer Visionv1.0

MMMU

by CUHK / Waterloo · free · Last verified 2026-03-01

MMMU is a challenging multimodal benchmark designed to evaluate large models on expert-level tasks. It contains over 11,500 college-level problems spanning six core disciplines, requiring models to integrate deep subject knowledge with visual perception to answer multiple-choice questions with detailed reasoning.

https://mmmu-benchmark.github.io ↗

B—Above Average

Adoption: B+Quality: A+Freshness: ACitations: B+Engagement: F

Specifications

License: Apache-2.0
Pricing: free
Capabilities: evaluating expert-level multimodal reasoning, assessing visual question answering in specialized domains, benchmarking large multimodal models (LMMs), testing knowledge across humanities, sciences, and engineering, measuring few-shot learning on complex problems, analyzing model performance on problems requiring chain-of-thought reasoning, providing a standardized test for college-level AI capabilities
Integrations
Use Cases: [object Object], [object Object], [object Object], [object Object]
API Available: No
Evaluated Models: claude-4, gpt-5, gemini-2.5-pro
Metrics: accuracy, per-discipline-accuracy
Methodology: College-level multiple-choice and open-ended questions with image inputs across 30 subjects. Tests both visual understanding and domain knowledge.
Last Run: 2026-03-01
Tags: benchmark, evaluation, multimodal, reasoning, expert-level, lmm-evaluation, visual-question-answering, vqa, college-level, science-reasoning, chain-of-thought
Added: 2026-03-17
Completeness: 0.9%

Index Score

66.9

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service