Skip to main content
Datasetmultilingualv1.0

NLLB Training Data

by Meta AI · open-source · Last verified 2026-03-17

The No Language Left Behind (NLLB) training corpus released by Meta AI contains high-quality parallel data across 200+ language pairs, including newly mined bitext for dozens of low-resource languages. It was used to train the NLLB-200 model achieving state-of-the-art translation on low-resource language pairs.

https://huggingface.co/datasets/allenai/nllb
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: AEngagement: F

Specifications

License
CC-BY-NC-4.0
Pricing
open-source
Capabilities
machine-translation, parallel-corpus, low-resource-translation
Integrations
huggingface-datasets
Use Cases
machine-translation, low-resource-nlp, translation-model-training
API Available
No
Tags
machine-translation, 200-languages, parallel-corpus, meta, low-resource
Added
2026-03-17
Completeness
100%

Index Score

68.5
Adoption
74
Quality
88
Freshness
73
Citations
85
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service