MGSM
by Google Research · open-source · Last verified 2026-03-01
Multilingual Grade School Math benchmark translating 250 GSM8K problems into 10 languages. Tests whether mathematical reasoning abilities transfer across languages including Bengali, Chinese, French, German, Japanese, Russian, Spanish, Swahili, Telugu, and Thai.
https://github.com/google-research/url-nlp/tree/main/mgsm ↗B
B—Above Average
Adoption: B+Quality: AFreshness: B+Citations: BEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- model-evaluation, multilingual-math-testing, cross-lingual-assessment
- Integrations
- lm-eval-harness
- Use Cases
- multilingual-evaluation, cross-lingual-reasoning-testing, language-fairness-assessment
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- average-accuracy, per-language-accuracy
- Methodology
- 250 GSM8K problems translated into 10 languages by professional translators. Models solve problems in each language with chain-of-thought prompting.
- Last Run
- 2026-02-01
- Tags
- benchmark, evaluation, math, multilingual, reasoning
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
61.4Adoption
70
Quality
82
Freshness
76
Citations
68
Engagement
0