Skip to main content
brand
context
industry
strategy
AaaS
BenchmarkLLMsv1.0

Minerva Math

by Google Research · free · Last verified 2026-03-01

Minerva Math is a quantitative reasoning benchmark designed to evaluate large language models on complex STEM problems. Sourced from web pages with LaTeX and arXiv preprints, it covers subjects like math, physics, and chemistry, requiring multi-step computation, symbolic manipulation, and deep scientific understanding to solve.

https://github.com/google-research/minerva
C+
C+Average
Adoption: BQuality: AFreshness: B+Citations: BEngagement: F

Specifications

License
Apache-2.0
Pricing
free
Capabilities
large-language-model-evaluation, quantitative-reasoning-assessment, stem-problem-solving-benchmarking, mathematical-computation-testing, symbolic-reasoning-evaluation, scientific-knowledge-application, multi-step-reasoning-analysis
Integrations
[object Object], [object Object]
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics
accuracy, stem-accuracy
Methodology
STEM problems requiring mathematical computation and scientific reasoning. Models generate step-by-step solutions with final answers checked for correctness.
Last Run
2026-01-25
Tags
benchmark, evaluation, mathematics, stem, quantitative-reasoning, llm-evaluation, dataset, scientific-reasoning, natural-language-processing, ai-capability-testing
Added
2026-03-17
Completeness
0.95%

Index Score

58.9
Adoption
64
Quality
84
Freshness
74
Citations
66
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service