Skip to main content
brand
context
industry
strategy
AaaS
Datasetmultilingualv1.0

OPUS-100

by University of Helsinki · free · Last verified 2026-03-17

OPUS-100 is a large-scale multilingual parallel corpus for machine translation, featuring 100 languages pivoted through English. Sampled from the OPUS collection, it provides up to 1 million sentence pairs per language pair, making it a standard benchmark for training and evaluating multilingual models.

https://huggingface.co/datasets/Helsinki-NLP/opus-100
B
BAbove Average
Adoption: AQuality: B+Freshness: BCitations: AEngagement: F

Specifications

License
Various (source-dependent)
Pricing
free
Capabilities
Multilingual machine translation model training, Cross-lingual transfer learning experiments, Low-resource language translation research, Benchmarking translation quality and systems, Lexicon and phrase table extraction, Development of sentence alignment algorithms, Fine-tuning large language models for translation tasks
Integrations
[object Object], [object Object], [object Object], [object Object]
Use Cases
[object Object], [object Object], [object Object], [object Object], [object Object]
API Available
No
Tags
parallel-corpus, machine-translation, multilingual-nlp, opus-corpus, low-resource-languages, cross-lingual-learning, nlp-dataset, sentence-alignment, text-data, 100-languages
Added
2026-03-17
Completeness
0.9%

Index Score

68.1
Adoption
80
Quality
78
Freshness
69
Citations
82
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service