Skip to main content
Datasetmultilingualv1.1

TyDi QA Dataset

by Google Research · open-source · Last verified 2026-03-17

TyDi QA is a question-answering benchmark for 11 typologically diverse languages, with questions written by native speakers before reading the answer passage to ensure information-seeking intent. It covers languages with diverse scripts, morphology, and syntax to stress-test multilingual QA systems beyond high-resource biases.

https://huggingface.co/datasets/copenlu/answerable_tydiqa
B
BAbove Average
Adoption: B+Quality: AFreshness: BCitations: AEngagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
multilingual-qa, reading-comprehension, information-seeking-evaluation
Integrations
huggingface-datasets
Use Cases
model-evaluation, multilingual-qa, cross-lingual-research
API Available
No
Tags
question-answering, multilingual, typologically-diverse, google, information-seeking
Added
2026-03-17
Completeness
100%

Index Score

66.9
Adoption
72
Quality
88
Freshness
68
Citations
82
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service