Skip to main content
BenchmarkLLMsv1.0

MGSM

by Google Research · open-source · Last verified 2026-03-01

Multilingual Grade School Math benchmark translating 250 GSM8K problems into 10 languages. Tests whether mathematical reasoning abilities transfer across languages including Bengali, Chinese, French, German, Japanese, Russian, Spanish, Swahili, Telugu, and Thai.

https://github.com/google-research/url-nlp/tree/main/mgsm
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: BEngagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
model-evaluation, multilingual-math-testing, cross-lingual-assessment
Integrations
lm-eval-harness
Use Cases
multilingual-evaluation, cross-lingual-reasoning-testing, language-fairness-assessment
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics
average-accuracy, per-language-accuracy
Methodology
250 GSM8K problems translated into 10 languages by professional translators. Models solve problems in each language with chain-of-thought prompting.
Last Run
2026-02-01
Tags
benchmark, evaluation, math, multilingual, reasoning
Added
2026-03-17
Completeness
100%

Index Score

61.4
Adoption
70
Quality
82
Freshness
76
Citations
68
Engagement
0

Explore the full AI ecosystem on Agents as a Service