TyDi QA Dataset
by Google Research · open-source · Last verified 2026-03-17
TyDi QA is a question-answering benchmark for 11 typologically diverse languages, with questions written by native speakers before reading the answer passage to ensure information-seeking intent. It covers languages with diverse scripts, morphology, and syntax to stress-test multilingual QA systems beyond high-resource biases.
https://huggingface.co/datasets/copenlu/answerable_tydiqa ↗B
B—Above Average
Adoption: B+Quality: AFreshness: BCitations: AEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- multilingual-qa, reading-comprehension, information-seeking-evaluation
- Integrations
- huggingface-datasets
- Use Cases
- model-evaluation, multilingual-qa, cross-lingual-research
- API Available
- No
- Tags
- question-answering, multilingual, typologically-diverse, google, information-seeking
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
66.9Adoption
72
Quality
88
Freshness
68
Citations
82
Engagement
0
Put AI to work for your business
Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.