Skip to main content
BenchmarkLLMsv1.0

GSM8K

by OpenAI · open-source · Last verified 2026-03-01

Grade School Math 8K benchmark with 8,500 linguistically diverse grade school math word problems requiring 2-8 step reasoning. Tests basic mathematical reasoning and arithmetic with problems that require sequential multi-step solutions.

https://github.com/openai/grade-school-math
B+
B+Good
Adoption: A+Quality: AFreshness: B+Citations: A+Engagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
model-evaluation, math-reasoning-testing, step-by-step-evaluation
Integrations
lm-eval-harness
Use Cases
math-ability-testing, reasoning-evaluation, model-comparison
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
Metrics
accuracy, 8-shot-accuracy
Methodology
Grade school math word problems requiring 2-8 step solutions. Models show work and provide final numerical answer evaluated for exact match.
Last Run
2026-01-15
Tags
benchmark, evaluation, math, grade-school, reasoning
Added
2026-03-17
Completeness
100%

Index Score

75.7
Adoption
92
Quality
82
Freshness
70
Citations
90
Engagement
0

Explore the full AI ecosystem on Agents as a Service