Humanity's Last Exam
by CAIS · free · Last verified 2026-03-01
Humanity's Last Exam is a crowdsourced benchmark designed to rigorously test the limits of advanced AI systems. It comprises extremely difficult questions contributed by domain experts across diverse fields like science, math, and philosophy, serving as a public evaluation for frontier model capabilities in complex reasoning and specialized knowledge.
https://lastexam.ai ↗B
B—Above Average
Adoption: BQuality: A+Freshness: A+Citations: BEngagement: F
Specifications
- License
- CC-BY-4.0
- Pricing
- free
- Capabilities
- frontier-model-evaluation, expert-level-reasoning-assessment, cross-domain-knowledge-synthesis, complex-problem-solving-benchmarking, identification-of-model-weaknesses, agi-capability-tracking, qualitative-safety-analysis, out-of-distribution-robustness-testing
- Integrations
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- accuracy, per-domain-accuracy
- Methodology
- Expert-contributed questions spanning 100+ academic disciplines. Each question verified by multiple domain experts. Models evaluated on both accuracy and reasoning quality.
- Last Run
- 2026-03-10
- Tags
- benchmark, evaluation, frontier-testing, expert-level, reasoning, agi-safety, llm-testing, knowledge-based-qa, problem-solving, multidisciplinary, ai-capability-assessment
- Added
- 2026-03-17
- Completeness
- 0.9%
Index Score
60.2Adoption
62
Quality
92
Freshness
94
Citations
68
Engagement
0