BenchmarkLLMsv1.0

FrontierMath

by Epoch AI · open-source · Last verified 2026-03-01

Benchmark of original, research-level mathematics problems created by professional mathematicians. Tests capabilities at the frontier of mathematical reasoning including novel proofs, advanced computation, and multi-domain mathematical synthesis.

https://epoch.ai/frontiermath ↗

C+

C+—Average

Adoption: C+Quality: A+Freshness: A+Citations: BEngagement: F

Specifications

License: CC-BY-4.0
Pricing: open-source
Capabilities: model-evaluation, mathematical-reasoning-testing, proof-assessment
Integrations: lm-eval-harness
Use Cases: mathematical-capability-testing, frontier-reasoning-evaluation, research
API Available: No
Evaluated Models: claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics: solve-rate, proof-validity
Methodology: Research-level math problems with verified solutions created by professional mathematicians. Models submit final numerical answers or proof sketches evaluated by expert reviewers.
Last Run: 2026-03-05
Tags: benchmark, evaluation, mathematics, frontier, proof
Added: 2026-03-17
Completeness: 100%

Index Score

55.9

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service