Skip to main content
brand
context
industry
strategy
AaaS
BenchmarkLLMsv1.0

DROP

by Allen AI · free · Last verified 2026-03-01

DROP (Discrete Reasoning Over Paragraphs) is a challenging benchmark designed to evaluate a model's numerical reasoning capabilities within textual contexts. It requires systems to read paragraphs and answer questions that involve discrete operations like addition, counting, sorting, or comparison. Unlike simpler QA datasets, DROP necessitates multi-step reasoning processes, pushing models beyond basic information retrieval.

https://allenai.org/data/drop
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F

Specifications

License
Apache-2.0
Pricing
free
Capabilities
multi-step reasoning evaluation, numerical reasoning assessment, arithmetic operation testing (addition, subtraction), counting and sorting validation, comparative reasoning analysis, information extraction from complex passages, negation handling in questions, coreference resolution testing
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics
f1-score, exact-match
Methodology
Reading comprehension with questions requiring discrete reasoning operations like counting, sorting, and arithmetic over passage content.
Last Run
2026-01-25
Tags
benchmark, dataset, evaluation, reading-comprehension, reasoning, numerical, question-answering, natural-language-processing, arithmetic-reasoning, multi-step-reasoning
Added
2026-03-17
Completeness
0.9%

Index Score

66.7
Adoption
76
Quality
84
Freshness
72
Citations
78
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service