Skip to main content
brand
context
industry
strategy
AaaS
BenchmarkLLMsv1.0

MGSM

by Google Research · free · Last verified 2026-03-01

MGSM (Multilingual Grade School Math) is a benchmark for evaluating the mathematical reasoning of large language models across multiple languages. It consists of 250 grade-school math problems from the GSM8K dataset, professionally translated into ten typologically diverse languages, including low-resource ones like Swahili and Telugu.

https://github.com/google-research/url-nlp/tree/main/mgsm
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: BEngagement: F

Specifications

License
Apache-2.0
Pricing
free
Capabilities
Evaluating multilingual mathematical reasoning, Benchmarking large language models (LLMs), Assessing cross-lingual transfer learning, Testing numerical and algebraic reasoning skills, Supporting evaluation in 10 languages: Bengali, Chinese, French, German, Japanese, Russian, Spanish, Swahili, Telugu, and Thai, Analyzing model performance on low-resource languages
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics
average-accuracy, per-language-accuracy
Methodology
250 GSM8K problems translated into 10 languages by professional translators. Models solve problems in each language with chain-of-thought prompting.
Last Run
2026-02-01
Tags
benchmark, evaluation, math, multilingual, reasoning, llm-evaluation, cross-lingual-transfer, grade-school-math, numerical-reasoning, natural-language-understanding
Added
2026-03-17
Completeness
0.8%

Index Score

61.4
Adoption
70
Quality
82
Freshness
76
Citations
68
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service