Minerva Math
by Google Research · open-source · Last verified 2026-03-01
Quantitative reasoning benchmark spanning STEM subjects including math, physics, chemistry, and engineering. Evaluates models on problems requiring mathematical computation, symbolic manipulation, and scientific reasoning with numerical or symbolic answers.
https://github.com/google-research/minerva ↗C+
C+—Average
Adoption: BQuality: AFreshness: B+Citations: BEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- model-evaluation, quantitative-reasoning-testing, stem-assessment
- Integrations
- lm-eval-harness
- Use Cases
- stem-evaluation, quantitative-reasoning-testing, scientific-computing-assessment
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- accuracy, stem-accuracy
- Methodology
- STEM problems requiring mathematical computation and scientific reasoning. Models generate step-by-step solutions with final answers checked for correctness.
- Last Run
- 2026-01-25
- Tags
- benchmark, evaluation, mathematics, stem, quantitative
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
58.9Adoption
64
Quality
84
Freshness
74
Citations
66
Engagement
0