FrontierMath
by Epoch AI · open-source · Last verified 2026-03-01
Benchmark of original, research-level mathematics problems created by professional mathematicians. Tests capabilities at the frontier of mathematical reasoning including novel proofs, advanced computation, and multi-domain mathematical synthesis.
https://epoch.ai/frontiermath ↗C+
C+—Average
Adoption: C+Quality: A+Freshness: A+Citations: BEngagement: F
Specifications
- License
- CC-BY-4.0
- Pricing
- open-source
- Capabilities
- model-evaluation, mathematical-reasoning-testing, proof-assessment
- Integrations
- lm-eval-harness
- Use Cases
- mathematical-capability-testing, frontier-reasoning-evaluation, research
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- solve-rate, proof-validity
- Methodology
- Research-level math problems with verified solutions created by professional mathematicians. Models submit final numerical answers or proof sketches evaluated by expert reviewers.
- Last Run
- 2026-03-05
- Tags
- benchmark, evaluation, mathematics, frontier, proof
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
55.9Adoption
56
Quality
90
Freshness
92
Citations
62
Engagement
0