OPUS-100
by University of Helsinki · free · Last verified 2026-03-17
OPUS-100 is a large-scale multilingual parallel corpus for machine translation, featuring 100 languages pivoted through English. Sampled from the OPUS collection, it provides up to 1 million sentence pairs per language pair, making it a standard benchmark for training and evaluating multilingual models.
https://huggingface.co/datasets/Helsinki-NLP/opus-100 ↗B
B—Above Average
Adoption: AQuality: B+Freshness: BCitations: AEngagement: F
Specifications
- License
- Various (source-dependent)
- Pricing
- free
- Capabilities
- Multilingual machine translation model training, Cross-lingual transfer learning experiments, Low-resource language translation research, Benchmarking translation quality and systems, Lexicon and phrase table extraction, Development of sentence alignment algorithms, Fine-tuning large language models for translation tasks
- Integrations
- [object Object], [object Object], [object Object], [object Object]
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Tags
- parallel-corpus, machine-translation, multilingual-nlp, opus-corpus, low-resource-languages, cross-lingual-learning, nlp-dataset, sentence-alignment, text-data, 100-languages
- Added
- 2026-03-17
- Completeness
- 0.9%
Index Score
68.1Adoption
80
Quality
78
Freshness
69
Citations
82
Engagement
0