Datasetmultilingualv1.0

OPUS-100

by University of Helsinki · free · Last verified 2026-03-17

OPUS-100 is a large-scale multilingual parallel corpus for machine translation, featuring 100 languages pivoted through English. Sampled from the OPUS collection, it provides up to 1 million sentence pairs per language pair, making it a standard benchmark for training and evaluating multilingual models.

https://huggingface.co/datasets/Helsinki-NLP/opus-100 ↗

C—Below Average

Adoption: AQuality: B+Freshness: BCitations: FEngagement: F

Specifications

License: Various (source-dependent)
Pricing: free
Capabilities: Multilingual machine translation model training, Cross-lingual transfer learning experiments, Low-resource language translation research, Benchmarking translation quality and systems, Lexicon and phrase table extraction, Development of sentence alignment algorithms, Fine-tuning large language models for translation tasks
Integrations: [object Object], [object Object], [object Object], [object Object]
Use Cases: [object Object], [object Object], [object Object], [object Object], [object Object]
API Available: No
Tags: parallel-corpus, machine-translation, multilingual-nlp, opus-corpus, low-resource-languages, cross-lingual-learning, nlp-dataset, sentence-alignment, text-data, 100-languages
Added: 2026-03-17
Completeness: 0.9%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service