Skip to main content
BenchmarkLLMsv2.0

TruthfulQA

by University of Oxford · open-source · Last verified 2026-03-01

Measures whether language models generate truthful answers to questions where humans are commonly mistaken. Covers health, law, finance, and politics topics where popular misconceptions and conspiracies create systematic failure modes.

https://github.com/sylinrl/TruthfulQA
B+
B+Good
Adoption: AQuality: AFreshness: B+Citations: AEngagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
model-evaluation, truthfulness-testing, factuality-assessment
Integrations
lm-eval-harness
Use Cases
safety-evaluation, factuality-benchmarking, model-alignment-testing
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics
truthful-rate, informative-rate, truthful-and-informative
Methodology
Open-ended generation questions where common misconceptions exist. Evaluated by fine-tuned judge models for truthfulness and informativeness.
Last Run
2026-02-10
Tags
benchmark, evaluation, truthfulness, factuality, safety
Added
2026-03-17
Completeness
100%

Index Score

71
Adoption
82
Quality
86
Freshness
76
Citations
84
Engagement
0

Explore the full AI ecosystem on Agents as a Service