Minerva Math
by Google Research · free · Last verified 2026-03-01
Minerva Math is a quantitative reasoning benchmark designed to evaluate large language models on complex STEM problems. Sourced from web pages with LaTeX and arXiv preprints, it covers subjects like math, physics, and chemistry, requiring multi-step computation, symbolic manipulation, and deep scientific understanding to solve.
https://github.com/google-research/minerva ↗C+
C+—Average
Adoption: BQuality: AFreshness: B+Citations: BEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- free
- Capabilities
- large-language-model-evaluation, quantitative-reasoning-assessment, stem-problem-solving-benchmarking, mathematical-computation-testing, symbolic-reasoning-evaluation, scientific-knowledge-application, multi-step-reasoning-analysis
- Integrations
- [object Object], [object Object]
- Use Cases
- [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- accuracy, stem-accuracy
- Methodology
- STEM problems requiring mathematical computation and scientific reasoning. Models generate step-by-step solutions with final answers checked for correctness.
- Last Run
- 2026-01-25
- Tags
- benchmark, evaluation, mathematics, stem, quantitative-reasoning, llm-evaluation, dataset, scientific-reasoning, natural-language-processing, ai-capability-testing
- Added
- 2026-03-17
- Completeness
- 0.95%
Index Score
58.9Adoption
64
Quality
84
Freshness
74
Citations
66
Engagement
0