BenchmarkLLMsv1.1

TyDi QA

by Clark et al. / Google Research · free · Last verified 2026-03-17

TyDi QA is a multilingual question-answering benchmark featuring 11 typologically diverse languages. Questions are written natively by speakers of each language, ensuring genuine linguistic challenges and avoiding translation artifacts. It is designed to evaluate reading comprehension across a wide range of language structures.

https://ai.google.com/research/tydiqa ↗

C—Below Average

Adoption: B+Quality: AFreshness: BCitations: FEngagement: F

Specifications

License: Apache-2.0
Pricing: free
Capabilities: multilingual-question-answering, extractive-qa-evaluation, cross-lingual-transfer-assessment, reading-comprehension-benchmarking, typological-diversity-testing, zero-shot-evaluation, few-shot-evaluation
Integrations
Use Cases: [object Object], [object Object], [object Object], [object Object]
API Available: No
Evaluated Models: gpt-4o, multilingual-bert, xlm-roberta-large, gemini-2-5-pro
Metrics: f1-score, exact-match
Methodology: Gold passage task: model selects answer span from a provided Wikipedia passage. Goldp F1 and EM averaged across 11 languages. Primary task is span extraction; secondary task is answer presence detection (boolean).
Last Run: 2025-12-20
Tags: question-answering, multilingual, typologically-diverse, reading-comprehension, nlp-benchmark, cross-lingual-transfer, dataset, evaluation, linguistic-diversity, extractive-qa
Added: 2026-03-17
Completeness: 80%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service