CrowS-Pairs
by Nangia et al. / NYU · open-source · Last verified 2026-03-17
CrowS-Pairs is a challenge dataset of 1,508 sentence pairs targeting stereotypical and anti-stereotypical statements across nine types of bias. It evaluates masked language models by measuring pseudo-log-likelihood scores to determine whether a model prefers stereotypical completions.
https://github.com/nyu-mll/crows-pairs ↗B
B—Above Average
Adoption: BQuality: AFreshness: C+Citations: AEngagement: F
Specifications
- License
- CC BY-SA 4.0
- Pricing
- open-source
- Capabilities
- evaluation, bias-measurement, masked-lm-evaluation
- Integrations
- Use Cases
- model-evaluation, ai-safety, bias-auditing
- API Available
- No
- Evaluated Models
- roberta-large, bert-large, gpt-2, llama-3-70b
- Metrics
- stereotype-score
- Methodology
- Each pair presents a more and less stereotypical sentence differing only in the target group. Stereotype score is the percentage of examples where the model assigns higher pseudo-log-likelihood to the stereotypical sentence (50% = no bias).
- Last Run
- 2025-10-01
- Tags
- bias, stereotypes, masked-lm, fairness, social-bias
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
62Adoption
65
Quality
80
Freshness
55
Citations
80
Engagement
0