Skip to main content
BenchmarkLLMsv1.0

Humanity's Last Exam

by CAIS · open-source · Last verified 2026-03-01

Crowdsourced benchmark of extremely difficult questions contributed by domain experts worldwide, designed to be the hardest public evaluation of AI capabilities. Covers advanced topics across science, mathematics, philosophy, and specialized professional knowledge.

https://lastexam.ai
B
BAbove Average
Adoption: BQuality: A+Freshness: A+Citations: BEngagement: F

Specifications

License
CC-BY-4.0
Pricing
open-source
Capabilities
model-evaluation, frontier-testing, expert-level-assessment
Integrations
lm-eval-harness
Use Cases
frontier-model-evaluation, capability-boundary-testing, research
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics
accuracy, per-domain-accuracy
Methodology
Expert-contributed questions spanning 100+ academic disciplines. Each question verified by multiple domain experts. Models evaluated on both accuracy and reasoning quality.
Last Run
2026-03-10
Tags
benchmark, evaluation, frontier, expert, reasoning
Added
2026-03-17
Completeness
100%

Index Score

60.2
Adoption
62
Quality
92
Freshness
94
Citations
68
Engagement
0

Explore the full AI ecosystem on Agents as a Service