Skip to main content
BenchmarkAI Ethics & Safetyv1.0

BBQ (Bias Benchmark for QA)

by Parrish et al. / NYU · open-source · Last verified 2026-03-17

BBQ (Bias Benchmark for QA) probes social biases in model outputs through ambiguous and disambiguated question-answering scenarios across nine protected characteristics. It measures whether models rely on stereotypes when context is insufficient versus when the correct answer is determinable.

https://github.com/nyu-mll/BBQ
B
BAbove Average
Adoption: B+Quality: AFreshness: BCitations: B+Engagement: F

Specifications

License
CC BY 4.0
Pricing
open-source
Capabilities
evaluation, bias-measurement, fairness-testing
Integrations
Use Cases
model-evaluation, ai-safety, bias-auditing
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, llama-3-70b, gemini-2-5-pro
Metrics
accuracy, bias-score
Methodology
58,492 questions spanning age, disability, gender, nationality, race, religion, sexual orientation, physical appearance, and socioeconomic status. Bias score measures over-reliance on stereotypes in ambiguous contexts (lower is better).
Last Run
2026-01-28
Tags
bias, qa, social-bias, disambiguation, fairness
Added
2026-03-17
Completeness
100%

Index Score

64.6
Adoption
70
Quality
88
Freshness
67
Citations
76
Engagement
0

Explore the full AI ecosystem on Agents as a Service