WinoGrande
by Allen AI · open-source · Last verified 2026-03-01
Large-scale dataset for commonsense coreference resolution inspired by Winograd schemas. Tests whether models can correctly resolve pronoun references based on world knowledge and commonsense reasoning in carefully constructed sentence pairs.
https://winogrande.allenai.org ↗B
B—Above Average
Adoption: AQuality: B+Freshness: BCitations: AEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- model-evaluation, coreference-testing, commonsense-assessment
- Integrations
- lm-eval-harness, helm
- Use Cases
- model-comparison, commonsense-evaluation, language-understanding
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
- Metrics
- accuracy, 5-shot-accuracy
- Methodology
- Binary-choice coreference resolution tasks. Models select which of two entities a pronoun refers to based on contextual and commonsense cues.
- Last Run
- 2026-01-15
- Tags
- benchmark, evaluation, commonsense, coreference, reasoning
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
69.7Adoption
84
Quality
78
Freshness
66
Citations
82
Engagement
0