Skip to main content
BenchmarkLLMsv1.1

TyDi QA

by Clark et al. / Google Research · open-source · Last verified 2026-03-17

TyDi QA (Typologically Diverse Question Answering) is a multilingual QA benchmark covering 11 typologically diverse languages from different language families. Unlike benchmarks translated from English, questions are natively authored by speakers of each language, providing genuine linguistic diversity.

https://ai.google.com/research/tydiqa
B
BAbove Average
Adoption: B+Quality: AFreshness: BCitations: B+Engagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
evaluation, multilingual-qa, reading-comprehension
Integrations
Use Cases
model-evaluation, multilingual-nlp, qa-systems
API Available
No
Evaluated Models
gpt-4o, multilingual-bert, xlm-roberta-large, gemini-2-5-pro
Metrics
f1-score, exact-match
Methodology
Gold passage task: model selects answer span from a provided Wikipedia passage. Goldp F1 and EM averaged across 11 languages. Primary task is span extraction; secondary task is answer presence detection (boolean).
Last Run
2025-12-20
Tags
question-answering, multilingual, typologically-diverse, reading-comprehension
Added
2026-03-17
Completeness
100%

Index Score

66.1
Adoption
72
Quality
89
Freshness
67
Citations
78
Engagement
0

Explore the full AI ecosystem on Agents as a Service