MMLU-Pro
by TIGER-Lab · open-source · Last verified 2026-03-01
Enhanced version of MMLU with harder questions, 10 answer choices instead of 4, and reduced sensitivity to prompt formatting. Provides more discriminative evaluation of frontier model capabilities with a focus on reasoning-intensive questions.
https://github.com/TIGER-AI-Lab/MMLU-Pro ↗B
B—Above Average
Adoption: B+Quality: A+Freshness: ACitations: B+Engagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- model-evaluation, advanced-knowledge-testing, reasoning-assessment
- Integrations
- lm-eval-harness
- Use Cases
- frontier-model-comparison, reasoning-evaluation, research
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
- Metrics
- accuracy, 5-shot-accuracy
- Methodology
- Multiple-choice questions with 10 options across augmented MMLU subjects. Evaluates with chain-of-thought prompting to test reasoning depth.
- Last Run
- 2026-02-20
- Tags
- benchmark, evaluation, knowledge, reasoning, advanced
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
67.2Adoption
78
Quality
90
Freshness
88
Citations
72
Engagement
0