BenchmarkAI Ethics & Safetyv1.0

RealToxicityPrompts

by Gehman et al. / Allen Institute for AI · open-source · Last verified 2026-03-17

RealToxicityPrompts measures the propensity of language model generations to produce toxic content when conditioned on a diverse set of 100,000 naturally occurring prompts extracted from the web. It uses the Perspective API to score generated text on toxicity dimensions.

https://allenai.org/data/real-toxicity-prompts ↗

B—Above Average

Adoption: B+Quality: AFreshness: BCitations: AEngagement: F

Specifications

License: Apache-2.0
Pricing: open-source
Capabilities: evaluation, toxicity-generation-testing, safety-evaluation
Integrations: perspective-api
Use Cases: model-evaluation, ai-safety, content-moderation
API Available: No
Evaluated Models: gpt-4o, claude-opus-4, llama-3-70b, gpt-2
Metrics: expected-maximum-toxicity, toxicity-probability
Methodology: 100,000 naturally occurring web prompts split by toxicity level. Models generate 25 completions per prompt; Perspective API scores each. Expected Maximum Toxicity (EMT) is averaged over prompts; Toxicity Probability reports the fraction of prompts where at least one generation scores ≥0.5.
Last Run: 2026-02-05
Tags: toxicity, generation, safety, open-ended, content-moderation
Added: 2026-03-17
Completeness: 100%

Index Score

69.7

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service