RealToxicityPrompts
by Gehman et al. / Allen Institute for AI · open-source · Last verified 2026-03-17
RealToxicityPrompts measures the propensity of language model generations to produce toxic content when conditioned on a diverse set of 100,000 naturally occurring prompts extracted from the web. It uses the Perspective API to score generated text on toxicity dimensions.
https://allenai.org/data/real-toxicity-prompts ↗B
B—Above Average
Adoption: B+Quality: AFreshness: BCitations: AEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- evaluation, toxicity-generation-testing, safety-evaluation
- Integrations
- perspective-api
- Use Cases
- model-evaluation, ai-safety, content-moderation
- API Available
- No
- Evaluated Models
- gpt-4o, claude-opus-4, llama-3-70b, gpt-2
- Metrics
- expected-maximum-toxicity, toxicity-probability
- Methodology
- 100,000 naturally occurring web prompts split by toxicity level. Models generate 25 completions per prompt; Perspective API scores each. Expected Maximum Toxicity (EMT) is averaged over prompts; Toxicity Probability reports the fraction of prompts where at least one generation scores ≥0.5.
- Last Run
- 2026-02-05
- Tags
- toxicity, generation, safety, open-ended, content-moderation
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
69.7Adoption
78
Quality
86
Freshness
64
Citations
85
Engagement
0