Massive Multitask Language Understanding (MMLU)
by · · Last verified
A comprehensive benchmark designed to measure an AI model's knowledge across 57 subjects, ranging from humanities to STEM. It assesses a model's understanding and reasoning capabilities in a zero-shot or few-shot setting, crucial for evaluating general intelligence.
https://huggingface.co/datasets/cais/mmlu ↗F
F—Critical
Adoption: FQuality: FFreshness: A+Citations: FEngagement: F
Specifications
- API Available
- No
- Tags
- evaluation-benchmark, multitask, knowledge, reasoning, llm-evaluation, zero-shot, few-shot
- Added
- 2026-03-25
- Completeness
- undefined%
Index Score
0Adoption
0
Quality
0
Freshness
100
Citations
0
Engagement
0
Put AI to work for your business
Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.
Stay updated on the AI ecosystem
Get weekly insights on tools, models, agents, and more — curated by AI.