Skip to main content
BenchmarkLLMsv1.0

FrontierMath

by Epoch AI · open-source · Last verified 2026-03-01

Benchmark of original, research-level mathematics problems created by professional mathematicians. Tests capabilities at the frontier of mathematical reasoning including novel proofs, advanced computation, and multi-domain mathematical synthesis.

https://epoch.ai/frontiermath
C+
C+Average
Adoption: C+Quality: A+Freshness: A+Citations: BEngagement: F

Specifications

License
CC-BY-4.0
Pricing
open-source
Capabilities
model-evaluation, mathematical-reasoning-testing, proof-assessment
Integrations
lm-eval-harness
Use Cases
mathematical-capability-testing, frontier-reasoning-evaluation, research
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics
solve-rate, proof-validity
Methodology
Research-level math problems with verified solutions created by professional mathematicians. Models submit final numerical answers or proof sketches evaluated by expert reviewers.
Last Run
2026-03-05
Tags
benchmark, evaluation, mathematics, frontier, proof
Added
2026-03-17
Completeness
100%

Index Score

55.9
Adoption
56
Quality
90
Freshness
92
Citations
62
Engagement
0

Explore the full AI ecosystem on Agents as a Service