Skip to main content
brand
context
industry
strategy
AaaS
BenchmarkAI Ethics & Safetyv1.0

CrowS-Pairs

by Nangia et al. / NYU · free · Last verified 2026-03-17

CrowS-Pairs is a benchmark dataset for evaluating social bias in masked language models. It contains 1,508 sentence pairs with stereotypical and anti-stereotypical statements across nine bias types. The benchmark measures a model's preference for stereotypical completions using pseudo-log-likelihood scores.

https://github.com/nyu-mll/crows-pairs
B
BAbove Average
Adoption: BQuality: AFreshness: C+Citations: AEngagement: F

Specifications

License
CC BY-SA 4.0
Pricing
free
Capabilities
social-bias-evaluation, stereotype-detection-in-lms, masked-language-model-probing, pseudo-log-likelihood-scoring, comparative-model-analysis, bias-quantification, fairness-auditing
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
No
Evaluated Models
roberta-large, bert-large, gpt-2, llama-3-70b
Metrics
stereotype-score
Methodology
Each pair presents a more and less stereotypical sentence differing only in the target group. Stereotype score is the percentage of examples where the model assigns higher pseudo-log-likelihood to the stereotypical sentence (50% = no bias).
Last Run
2025-10-01
Tags
bias, stereotypes, masked-lm, fairness, social-bias, nlp-benchmark, ai-ethics, model-evaluation, language-model-probing, dataset
Added
2026-03-17
Completeness
1%

Index Score

62
Adoption
65
Quality
80
Freshness
55
Citations
80
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service