Skip to main content
BenchmarkLLMsv1.0

ARC Challenge

by Allen AI · open-source · Last verified 2026-03-01

AI2 Reasoning Challenge featuring grade-school science questions that require commonsense reasoning and world knowledge. The Challenge set contains questions that simple retrieval and co-occurrence methods fail to answer correctly.

https://allenai.org/data/arc
B+
B+Good
Adoption: AQuality: AFreshness: B+Citations: AEngagement: F

Specifications

License
CC-BY-SA-4.0
Pricing
open-source
Capabilities
model-evaluation, reasoning-testing, science-knowledge-assessment
Integrations
lm-eval-harness, helm
Use Cases
model-comparison, reasoning-evaluation, education-ai-assessment
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
Metrics
accuracy, 25-shot-accuracy
Methodology
Multiple-choice science questions from standardized tests. Challenge set filtered to exclude questions solvable by simple baselines.
Last Run
2026-01-20
Tags
benchmark, evaluation, science, reasoning, commonsense
Added
2026-03-17
Completeness
100%

Index Score

73.1
Adoption
88
Quality
82
Freshness
70
Citations
86
Engagement
0

Explore the full AI ecosystem on Agents as a Service