Wikipedia Dump
by Wikimedia Foundation · open-source · Last verified 2026-03-17
The full text dump of Wikipedia articles available in over 300 languages, regularly updated and distributed by the Wikimedia Foundation. It is one of the most universally included components in language model pretraining pipelines due to its high factual density, editorial quality, and broad topical coverage.
https://dumps.wikimedia.org ↗A
A—Great
Adoption: A+Quality: A+Freshness: ACitations: A+Engagement: F
Specifications
- License
- CC-BY-SA-4.0
- Pricing
- open-source
- Capabilities
- language-modeling, question-answering, fact-checking, pretraining
- Integrations
- hugging-face, tensorflow-datasets
- Use Cases
- llm-pretraining, qa-systems, knowledge-grounding, rag
- API Available
- Yes
- Tags
- nlp, encyclopedic, factual, multilingual, pretraining
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
80.2Adoption
95
Quality
90
Freshness
88
Citations
97
Engagement
0
Put AI to work for your business
Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.