DROP
by Allen AI · free · Last verified 2026-03-01
DROP (Discrete Reasoning Over Paragraphs) is a challenging benchmark designed to evaluate a model's numerical reasoning capabilities within textual contexts. It requires systems to read paragraphs and answer questions that involve discrete operations like addition, counting, sorting, or comparison. Unlike simpler QA datasets, DROP necessitates multi-step reasoning processes, pushing models beyond basic information retrieval.
https://allenai.org/data/drop ↗B
B—Above Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F
Specifications
- License
- Apache-2.0
- Pricing
- free
- Capabilities
- multi-step reasoning evaluation, numerical reasoning assessment, arithmetic operation testing (addition, subtraction), counting and sorting validation, comparative reasoning analysis, information extraction from complex passages, negation handling in questions, coreference resolution testing
- Integrations
- Use Cases
- [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- f1-score, exact-match
- Methodology
- Reading comprehension with questions requiring discrete reasoning operations like counting, sorting, and arithmetic over passage content.
- Last Run
- 2026-01-25
- Tags
- benchmark, dataset, evaluation, reading-comprehension, reasoning, numerical, question-answering, natural-language-processing, arithmetic-reasoning, multi-step-reasoning
- Added
- 2026-03-17
- Completeness
- 0.9%
Index Score
66.7Adoption
76
Quality
84
Freshness
72
Citations
78
Engagement
0