SimpleQA
by OpenAI · free · Last verified 2026-03-01
SimpleQA is a benchmark dataset developed by OpenAI to assess the factual accuracy of language models. It consists of simple, unambiguous questions that have a single, verifiable correct answer. The benchmark is designed to measure a model's ability to recall factual knowledge and, crucially, to abstain from answering when it is uncertain, providing a measure of its calibration.
https://openai.com/research/simple-qa ↗B
B—Above Average
Adoption: BQuality: AFreshness: A+Citations: BEngagement: F
Specifications
- License
- MIT
- Pricing
- free
- Capabilities
- Factual knowledge recall testing, Language model accuracy measurement, Model calibration assessment, Benchmarking against established models, Identifying knowledge gaps in LLMs, Evaluating model's ability to abstain from answering, Standardized scoring for model comparison
- Integrations
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- accuracy, calibration, abstention-rate
- Methodology
- Simple factual questions with verified single correct answers. Measures both accuracy and whether models appropriately decline to answer when uncertain.
- Last Run
- 2026-03-01
- Tags
- benchmark, evaluation, factuality, qa, knowledge, openai, llm-evaluation, factual-accuracy, calibration, question-answering, knowledge-recall
- Added
- 2026-03-17
- Completeness
- 0.9%
Index Score
60.4Adoption
68
Quality
86
Freshness
90
Citations
64
Engagement
0