Skip to main content
BenchmarkAI Ethics & Safetyv1.0

WinoBias

by Zhao et al. / USC · open-source · Last verified 2026-03-17

WinoBias evaluates gender bias in coreference resolution systems by presenting sentences with gendered pronouns that may or may not align with gender-stereotyped occupations. It quantifies whether models systematically resolve pronouns based on occupational stereotypes rather than syntactic cues.

https://github.com/uclanlp/corefBias
C+
C+Average
Adoption: BQuality: B+Freshness: C+Citations: B+Engagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
evaluation, bias-measurement, coreference-evaluation
Integrations
Use Cases
model-evaluation, ai-safety, gender-bias-auditing
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, spacy-lg
Metrics
f1-score, gender-bias-gap
Methodology
3,160 sentences split between pro-stereotypical (Type 1) and anti-stereotypical (Type 2) configurations. Gender bias gap is computed as the F1 difference between pro- and anti-stereotypical conditions; smaller gap indicates less bias.
Last Run
2025-09-15
Tags
bias, gender-bias, coreference, fairness, pronoun
Added
2026-03-17
Completeness
100%

Index Score

59.8
Adoption
62
Quality
79
Freshness
52
Citations
77
Engagement
0

Explore the full AI ecosystem on Agents as a Service