BenchmarkLLMsv1.0

MGSM

by Google Research · free · Last verified 2026-03-01

MGSM (Multilingual Grade School Math) is a benchmark for evaluating the mathematical reasoning of large language models across multiple languages. It consists of 250 grade-school math problems from the GSM8K dataset, professionally translated into ten typologically diverse languages, including low-resource ones like Swahili and Telugu.

https://github.com/google-research/url-nlp/tree/main/mgsm ↗

B—Above Average

Adoption: B+Quality: AFreshness: B+Citations: BEngagement: F

Specifications

License: Apache-2.0
Pricing: free
Capabilities: Evaluating multilingual mathematical reasoning, Benchmarking large language models (LLMs), Assessing cross-lingual transfer learning, Testing numerical and algebraic reasoning skills, Supporting evaluation in 10 languages: Bengali, Chinese, French, German, Japanese, Russian, Spanish, Swahili, Telugu, and Thai, Analyzing model performance on low-resource languages
Integrations
Use Cases: [object Object], [object Object], [object Object], [object Object]
API Available: No
Evaluated Models: claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics: average-accuracy, per-language-accuracy
Methodology: 250 GSM8K problems translated into 10 languages by professional translators. Models solve problems in each language with chain-of-thought prompting.
Last Run: 2026-02-01
Tags: benchmark, evaluation, math, multilingual, reasoning, llm-evaluation, cross-lingual-transfer, grade-school-math, numerical-reasoning, natural-language-understanding
Added: 2026-03-17
Completeness: 0.8%

Index Score

61.4

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service