Datasetbenchmarksv1.1

HellaSwag Dataset

by University of Washington · open-source · Last verified 2026-03-17

HellaSwag is an adversarially filtered commonsense NLI benchmark where models must pick the most plausible sentence completion from 4 options. Humans score 95%+ while early LLMs struggled below 50%, making it a robust test of grounded language understanding and commonsense reasoning.

https://huggingface.co/datasets/Rowan/hellaswag ↗

B+

B+—Good

Adoption: A+Quality: AFreshness: B+Citations: A+Engagement: F

Specifications

License: MIT
Pricing: open-source
Capabilities: commonsense-evaluation, sentence-completion-benchmark
Integrations: huggingface-datasets, lm-eval-harness
Use Cases: model-evaluation, commonsense-reasoning, benchmarking
API Available: No
Tags: benchmark, commonsense, sentence-completion, adversarial, grounding
Added: 2026-03-17
Completeness: 100%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service