XL-Sum Dataset
by BUET (Bangladesh University of Engineering and Technology) · free · Last verified 2026-03-17
XL-Sum is a massive multilingual dataset for abstractive summarization. It consists of over 1 million article-summary pairs scraped from BBC News, covering 44 different languages. This diversity makes it a crucial resource for developing and evaluating cross-lingual and multilingual summarization models.
https://huggingface.co/datasets/csebuetnlp/xlsum ↗B
B—Above Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F
Specifications
- License
- CC-BY-NC-SA-4.0
- Pricing
- free
- Capabilities
- multilingual-text-summarization, cross-lingual-summarization-research, abstractive-summary-generation, low-resource-language-nlp, model-evaluation-and-benchmarking, transfer-learning-for-nlp, news-article-analysis
- Integrations
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Tags
- summarization, multilingual, news, bbc, nlp-dataset, abstractive-summarization, cross-lingual, text-generation, low-resource-languages, sequence-to-sequence
- Added
- 2026-03-17
- Completeness
- 0.7%
Index Score
64.9Adoption
71
Quality
85
Freshness
70
Citations
78
Engagement
0