Skip to main content
BenchmarkLLMsv1.0

SimpleQA

by OpenAI · open-source · Last verified 2026-03-01

OpenAI's benchmark for measuring factual accuracy of language models on simple, unambiguous questions with single correct answers. Tests whether models can accurately recall factual knowledge and appropriately abstain when uncertain.

https://openai.com/research/simple-qa
B
BAbove Average
Adoption: BQuality: AFreshness: A+Citations: BEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
model-evaluation, factuality-testing, calibration-assessment
Integrations
lm-eval-harness
Use Cases
factual-accuracy-testing, hallucination-measurement, model-calibration
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics
accuracy, calibration, abstention-rate
Methodology
Simple factual questions with verified single correct answers. Measures both accuracy and whether models appropriately decline to answer when uncertain.
Last Run
2026-03-01
Tags
benchmark, evaluation, factuality, qa, knowledge
Added
2026-03-17
Completeness
100%

Index Score

60.4
Adoption
68
Quality
86
Freshness
90
Citations
64
Engagement
0

Explore the full AI ecosystem on Agents as a Service