LAION-400M Text Captions
by LAION · free · Last verified 2026-03-17
The text caption component of the LAION-400M dataset, offering 400 million English alt-text captions. These captions were scraped from the web and filtered using CLIP to ensure a minimum similarity to their corresponding images. The text is used independently for large-scale NLP and multimodal research.
https://laion.ai/blog/laion-400-open-dataset/ ↗B
B—Above Average
Adoption: B+Quality: B+Freshness: CCitations: AEngagement: F
Specifications
- License
- CC-BY-4.0
- Pricing
- free
- Capabilities
- caption-generation, image-text-alignment, concept-grounding, large-scale-language-model-training, multimodal-model-pre-training, visual-question-answering-dataset-creation, zero-shot-classification-research, text-to-image-model-training
- Integrations
- Use Cases
- [object Object], [object Object], [object Object], [object Object]
- API Available
- Yes
- Tags
- nlp, captions, image-text, multilingual, clip, large-scale, web-scraped, multimodal-research, dataset, natural-language-processing, computer-vision
- Added
- 2026-03-17
- Completeness
- 0.8%
Index Score
66.3Adoption
74
Quality
76
Freshness
45
Citations
86
Engagement
0