MMLU
by UC Berkeley · free · Last verified 2026-04-24
MMLU (Massive Multitask Language Understanding) is a comprehensive benchmark covering 57 academic subjects from elementary to professional level, including STEM, law, medicine, and social sciences. It became the standard for measuring general knowledge breadth in LLMs and is included in virtually every model evaluation suite.
https://github.com/hendrycks/test ↗C
C—Below Average
Adoption: C+Quality: B+Freshness: ACitations: CEngagement: F
Specifications
- License
- Proprietary
- Pricing
- free
- Capabilities
- Integrations
- Use Cases
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
- Metrics
- accuracy, 5-shot-accuracy, per-subject-accuracy
- Methodology
- Multiple-choice questions across 57 subjects evaluated with 0-shot and 5-shot prompting. Models select from four answer options per question.
- Last Run
- 2026-02-15
- Tags
- benchmark, knowledge, multitask, academic, comprehensive, standard
- Added
- 2026-04-24
- Completeness
- 60%
Index Score
44Adoption
50
Quality
70
Freshness
80
Citations
40
Engagement
0