LAION-400M Text Captions
by LAION · open-source · Last verified 2026-03-17
The text caption component of the LAION-400M image-text pair dataset, containing 400 million English alt-text captions scraped from the web and filtered by CLIP similarity scores. The captions are used independently for NLP tasks such as concept grounding, visual question answering dataset construction, and multimodal embedding alignment research.
https://laion.ai/blog/laion-400-open-dataset/ ↗B
B—Above Average
Adoption: B+Quality: B+Freshness: CCitations: AEngagement: F
Specifications
- License
- CC-BY-4.0
- Pricing
- open-source
- Capabilities
- caption-generation, image-text-alignment, concept-grounding
- Integrations
- hugging-face
- Use Cases
- vision-language-pretraining, caption-generation, research
- API Available
- Yes
- Tags
- nlp, captions, image-text, multilingual, clip
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
66.3Adoption
74
Quality
76
Freshness
45
Citations
86
Engagement
0
Put AI to work for your business
Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.