Skip to main content
brand
context
industry
strategy
AaaS
BenchmarkLLMsv1.0

SimpleQA

by OpenAI · free · Last verified 2026-03-01

SimpleQA is a benchmark dataset developed by OpenAI to assess the factual accuracy of language models. It consists of simple, unambiguous questions that have a single, verifiable correct answer. The benchmark is designed to measure a model's ability to recall factual knowledge and, crucially, to abstain from answering when it is uncertain, providing a measure of its calibration.

https://openai.com/research/simple-qa
B
BAbove Average
Adoption: BQuality: AFreshness: A+Citations: BEngagement: F

Specifications

License
MIT
Pricing
free
Capabilities
Factual knowledge recall testing, Language model accuracy measurement, Model calibration assessment, Benchmarking against established models, Identifying knowledge gaps in LLMs, Evaluating model's ability to abstain from answering, Standardized scoring for model comparison
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object], [object Object]
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics
accuracy, calibration, abstention-rate
Methodology
Simple factual questions with verified single correct answers. Measures both accuracy and whether models appropriately decline to answer when uncertain.
Last Run
2026-03-01
Tags
benchmark, evaluation, factuality, qa, knowledge, openai, llm-evaluation, factual-accuracy, calibration, question-answering, knowledge-recall
Added
2026-03-17
Completeness
0.9%

Index Score

60.4
Adoption
68
Quality
86
Freshness
90
Citations
64
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service