GSM8K
by OpenAI · open-source · Last verified 2026-03-01
Grade School Math 8K benchmark with 8,500 linguistically diverse grade school math word problems requiring 2-8 step reasoning. Tests basic mathematical reasoning and arithmetic with problems that require sequential multi-step solutions.
https://github.com/openai/grade-school-math ↗B+
B+—Good
Adoption: A+Quality: AFreshness: B+Citations: A+Engagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- model-evaluation, math-reasoning-testing, step-by-step-evaluation
- Integrations
- lm-eval-harness
- Use Cases
- math-ability-testing, reasoning-evaluation, model-comparison
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
- Metrics
- accuracy, 8-shot-accuracy
- Methodology
- Grade school math word problems requiring 2-8 step solutions. Models show work and provide final numerical answer evaluated for exact match.
- Last Run
- 2026-01-15
- Tags
- benchmark, evaluation, math, grade-school, reasoning
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
75.7Adoption
92
Quality
82
Freshness
70
Citations
90
Engagement
0