Skip to main content
BenchmarkLLMsv1.0

MATH

by UC Berkeley · open-source · Last verified 2026-03-01

Collection of 12,500 competition mathematics problems from AMC, AIME, and other math competitions covering algebra, geometry, number theory, combinatorics, and more. Problems require multi-step reasoning and mathematical insight beyond pattern matching.

https://github.com/hendrycks/math
B+
B+Good
Adoption: AQuality: AFreshness: B+Citations: AEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
model-evaluation, competition-math-testing, advanced-reasoning-assessment
Integrations
lm-eval-harness
Use Cases
mathematical-reasoning-evaluation, frontier-model-comparison, research
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
Metrics
accuracy, level-5-accuracy
Methodology
Competition math problems at 5 difficulty levels. Models generate step-by-step solutions with final answers compared against ground truth using symbolic equivalence checking.
Last Run
2026-01-20
Tags
benchmark, evaluation, mathematics, competition, reasoning
Added
2026-03-17
Completeness
100%

Index Score

74.4
Adoption
88
Quality
86
Freshness
74
Citations
88
Engagement
0

Explore the full AI ecosystem on Agents as a Service