CrowS-Pairs
by Nangia et al. / NYU · free · Last verified 2026-03-17
CrowS-Pairs is a benchmark dataset for evaluating social bias in masked language models. It contains 1,508 sentence pairs with stereotypical and anti-stereotypical statements across nine bias types. The benchmark measures a model's preference for stereotypical completions using pseudo-log-likelihood scores.
https://github.com/nyu-mll/crows-pairs ↗B
B—Above Average
Adoption: BQuality: AFreshness: C+Citations: AEngagement: F
Specifications
- License
- CC BY-SA 4.0
- Pricing
- free
- Capabilities
- social-bias-evaluation, stereotype-detection-in-lms, masked-language-model-probing, pseudo-log-likelihood-scoring, comparative-model-analysis, bias-quantification, fairness-auditing
- Integrations
- Use Cases
- [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Evaluated Models
- roberta-large, bert-large, gpt-2, llama-3-70b
- Metrics
- stereotype-score
- Methodology
- Each pair presents a more and less stereotypical sentence differing only in the target group. Stereotype score is the percentage of examples where the model assigns higher pseudo-log-likelihood to the stereotypical sentence (50% = no bias).
- Last Run
- 2025-10-01
- Tags
- bias, stereotypes, masked-lm, fairness, social-bias, nlp-benchmark, ai-ethics, model-evaluation, language-model-probing, dataset
- Added
- 2026-03-17
- Completeness
- 1%
Index Score
62Adoption
65
Quality
80
Freshness
55
Citations
80
Engagement
0