WinoBias
by Zhao et al. / USC · free · Last verified 2026-03-17
WinoBias is a benchmark dataset designed to measure gender bias in coreference resolution systems. It consists of sentence pairs where pronouns refer to individuals in stereotyped or non-stereotyped occupations, allowing for the quantification of a model's reliance on gender stereotypes versus grammatical correctness.
https://github.com/uclanlp/corefBias ↗C+
C+—Average
Adoption: BQuality: B+Freshness: C+Citations: B+Engagement: F
Specifications
- License
- MIT
- Pricing
- free
- Capabilities
- gender bias measurement, coreference resolution evaluation, stereotype detection in language models, pronoun resolution analysis, fairness auditing for NLP, comparative model analysis
- Integrations
- [object Object], [object Object]
- Use Cases
- [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Evaluated Models
- gpt-4o, claude-opus-4, spacy-lg
- Metrics
- f1-score, gender-bias-gap
- Methodology
- 3,160 sentences split between pro-stereotypical (Type 1) and anti-stereotypical (Type 2) configurations. Gender bias gap is computed as the F1 difference between pro- and anti-stereotypical conditions; smaller gap indicates less bias.
- Last Run
- 2025-09-15
- Tags
- bias, gender-bias, coreference, fairness, pronoun, nlp, ai-ethics, evaluation-dataset, responsible-ai, language-model-testing, english
- Added
- 2026-03-17
- Completeness
- 1%
Index Score
59.8Adoption
62
Quality
79
Freshness
52
Citations
77
Engagement
0