Minerva Math
by Google Research · free · Last verified 2026-03-01
Minerva Math is a quantitative reasoning benchmark designed to evaluate large language models on complex STEM problems. Sourced from web pages with LaTeX and arXiv preprints, it covers subjects like math, physics, and chemistry, requiring multi-step computation, symbolic manipulation, and deep scientific understanding to solve.
https://github.com/google-research/minerva ↗C
C—Below Average
Adoption: BQuality: AFreshness: B+Citations: FEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- free
- Capabilities
- large-language-model-evaluation, quantitative-reasoning-assessment, stem-problem-solving-benchmarking, mathematical-computation-testing, symbolic-reasoning-evaluation, scientific-knowledge-application, multi-step-reasoning-analysis
- Integrations
- [object Object], [object Object]
- Use Cases
- [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- accuracy, stem-accuracy
- Methodology
- STEM problems requiring mathematical computation and scientific reasoning. Models generate step-by-step solutions with final answers checked for correctness.
- Last Run
- 2026-01-25
- Tags
- benchmark, evaluation, mathematics, stem, quantitative-reasoning, llm-evaluation, dataset, scientific-reasoning, natural-language-processing, ai-capability-testing
- Added
- 2026-03-17
- Completeness
- 80%
Index Score
42Adoption
64
Quality
84
Freshness
74
Citations
0
Engagement
0