BenchmarkLLMsv1.0

Humanity's Last Exam

by CAIS · free · Last verified 2026-03-01

Humanity's Last Exam is a crowdsourced benchmark designed to rigorously test the limits of advanced AI systems. It comprises extremely difficult questions contributed by domain experts across diverse fields like science, math, and philosophy, serving as a public evaluation for frontier model capabilities in complex reasoning and specialized knowledge.

https://lastexam.ai ↗

B—Above Average

Adoption: BQuality: A+Freshness: A+Citations: BEngagement: F

Specifications

License: CC-BY-4.0
Pricing: free
Capabilities: frontier-model-evaluation, expert-level-reasoning-assessment, cross-domain-knowledge-synthesis, complex-problem-solving-benchmarking, identification-of-model-weaknesses, agi-capability-tracking, qualitative-safety-analysis, out-of-distribution-robustness-testing
Integrations
Use Cases: [object Object], [object Object], [object Object], [object Object], [object Object]
API Available: No
Evaluated Models: claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics: accuracy, per-domain-accuracy
Methodology: Expert-contributed questions spanning 100+ academic disciplines. Each question verified by multiple domain experts. Models evaluated on both accuracy and reasoning quality.
Last Run: 2026-03-10
Tags: benchmark, evaluation, frontier-testing, expert-level, reasoning, agi-safety, llm-testing, knowledge-based-qa, problem-solving, multidisciplinary, ai-capability-assessment
Added: 2026-03-17
Completeness: 0.9%

Index Score

60.2

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service