Humanity's Last Exam
by CAIS · open-source · Last verified 2026-03-01
Crowdsourced benchmark of extremely difficult questions contributed by domain experts worldwide, designed to be the hardest public evaluation of AI capabilities. Covers advanced topics across science, mathematics, philosophy, and specialized professional knowledge.
https://lastexam.ai ↗B
B—Above Average
Adoption: BQuality: A+Freshness: A+Citations: BEngagement: F
Specifications
- License
- CC-BY-4.0
- Pricing
- open-source
- Capabilities
- model-evaluation, frontier-testing, expert-level-assessment
- Integrations
- lm-eval-harness
- Use Cases
- frontier-model-evaluation, capability-boundary-testing, research
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- accuracy, per-domain-accuracy
- Methodology
- Expert-contributed questions spanning 100+ academic disciplines. Each question verified by multiple domain experts. Models evaluated on both accuracy and reasoning quality.
- Last Run
- 2026-03-10
- Tags
- benchmark, evaluation, frontier, expert, reasoning
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
60.2Adoption
62
Quality
92
Freshness
94
Citations
68
Engagement
0