Skip to main content
brand
context
industry
strategy
AaaS
Benchmarkbenchmarks-evaluationv1.0

MMLU

by UC Berkeley · free · Last verified 2026-04-24

MMLU (Massive Multitask Language Understanding) is a comprehensive benchmark covering 57 academic subjects from elementary to professional level, including STEM, law, medicine, and social sciences. It became the standard for measuring general knowledge breadth in LLMs and is included in virtually every model evaluation suite.

https://github.com/hendrycks/test
C
CBelow Average
Adoption: C+Quality: B+Freshness: ACitations: CEngagement: F

Specifications

License
Proprietary
Pricing
free
Capabilities
Integrations
Use Cases
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
Metrics
accuracy, 5-shot-accuracy, per-subject-accuracy
Methodology
Multiple-choice questions across 57 subjects evaluated with 0-shot and 5-shot prompting. Models select from four answer options per question.
Last Run
2026-02-15
Tags
benchmark, knowledge, multitask, academic, comprehensive, standard
Added
2026-04-24
Completeness
60%

Index Score

44
Adoption
50
Quality
70
Freshness
80
Citations
40
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service