Skip to main content
brand
context
industry
strategy
AaaS
BenchmarkLLMsv1.0

Humanity's Last Exam

by CAIS · free · Last verified 2026-03-01

Humanity's Last Exam is a crowdsourced benchmark designed to rigorously test the limits of advanced AI systems. It comprises extremely difficult questions contributed by domain experts across diverse fields like science, math, and philosophy, serving as a public evaluation for frontier model capabilities in complex reasoning and specialized knowledge.

https://lastexam.ai
B
BAbove Average
Adoption: BQuality: A+Freshness: A+Citations: BEngagement: F

Specifications

License
CC-BY-4.0
Pricing
free
Capabilities
frontier-model-evaluation, expert-level-reasoning-assessment, cross-domain-knowledge-synthesis, complex-problem-solving-benchmarking, identification-of-model-weaknesses, agi-capability-tracking, qualitative-safety-analysis, out-of-distribution-robustness-testing
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object], [object Object]
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics
accuracy, per-domain-accuracy
Methodology
Expert-contributed questions spanning 100+ academic disciplines. Each question verified by multiple domain experts. Models evaluated on both accuracy and reasoning quality.
Last Run
2026-03-10
Tags
benchmark, evaluation, frontier-testing, expert-level, reasoning, agi-safety, llm-testing, knowledge-based-qa, problem-solving, multidisciplinary, ai-capability-assessment
Added
2026-03-17
Completeness
0.9%

Index Score

60.2
Adoption
62
Quality
92
Freshness
94
Citations
68
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service