TruthfulQA
by University of Oxford · open-source · Last verified 2026-03-01
Measures whether language models generate truthful answers to questions where humans are commonly mistaken. Covers health, law, finance, and politics topics where popular misconceptions and conspiracies create systematic failure modes.
https://github.com/sylinrl/TruthfulQA ↗B+
B+—Good
Adoption: AQuality: AFreshness: B+Citations: AEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- model-evaluation, truthfulness-testing, factuality-assessment
- Integrations
- lm-eval-harness
- Use Cases
- safety-evaluation, factuality-benchmarking, model-alignment-testing
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- truthful-rate, informative-rate, truthful-and-informative
- Methodology
- Open-ended generation questions where common misconceptions exist. Evaluated by fine-tuned judge models for truthfulness and informativeness.
- Last Run
- 2026-02-10
- Tags
- benchmark, evaluation, truthfulness, factuality, safety
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
71Adoption
82
Quality
86
Freshness
76
Citations
84
Engagement
0