SimpleQA
by OpenAI · open-source · Last verified 2026-03-01
OpenAI's benchmark for measuring factual accuracy of language models on simple, unambiguous questions with single correct answers. Tests whether models can accurately recall factual knowledge and appropriately abstain when uncertain.
https://openai.com/research/simple-qa ↗B
B—Above Average
Adoption: BQuality: AFreshness: A+Citations: BEngagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- model-evaluation, factuality-testing, calibration-assessment
- Integrations
- lm-eval-harness
- Use Cases
- factual-accuracy-testing, hallucination-measurement, model-calibration
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- accuracy, calibration, abstention-rate
- Methodology
- Simple factual questions with verified single correct answers. Measures both accuracy and whether models appropriately decline to answer when uncertain.
- Last Run
- 2026-03-01
- Tags
- benchmark, evaluation, factuality, qa, knowledge
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
60.4Adoption
68
Quality
86
Freshness
90
Citations
64
Engagement
0