Skip to main content
brand
context
industry
strategy
AaaS
BenchmarkAI Ethics & Safetyv1.0

WinoBias

by Zhao et al. / USC · free · Last verified 2026-03-17

WinoBias is a benchmark dataset designed to measure gender bias in coreference resolution systems. It consists of sentence pairs where pronouns refer to individuals in stereotyped or non-stereotyped occupations, allowing for the quantification of a model's reliance on gender stereotypes versus grammatical correctness.

https://github.com/uclanlp/corefBias
C+
C+Average
Adoption: BQuality: B+Freshness: C+Citations: B+Engagement: F

Specifications

License
MIT
Pricing
free
Capabilities
gender bias measurement, coreference resolution evaluation, stereotype detection in language models, pronoun resolution analysis, fairness auditing for NLP, comparative model analysis
Integrations
[object Object], [object Object]
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, spacy-lg
Metrics
f1-score, gender-bias-gap
Methodology
3,160 sentences split between pro-stereotypical (Type 1) and anti-stereotypical (Type 2) configurations. Gender bias gap is computed as the F1 difference between pro- and anti-stereotypical conditions; smaller gap indicates less bias.
Last Run
2025-09-15
Tags
bias, gender-bias, coreference, fairness, pronoun, nlp, ai-ethics, evaluation-dataset, responsible-ai, language-model-testing, english
Added
2026-03-17
Completeness
1%

Index Score

59.8
Adoption
62
Quality
79
Freshness
52
Citations
77
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service