Skip to main content
DatasetLLMsv1.0

LAION-400M Text Captions

by LAION · open-source · Last verified 2026-03-17

The text caption component of the LAION-400M image-text pair dataset, containing 400 million English alt-text captions scraped from the web and filtered by CLIP similarity scores. The captions are used independently for NLP tasks such as concept grounding, visual question answering dataset construction, and multimodal embedding alignment research.

https://laion.ai/blog/laion-400-open-dataset/
B
BAbove Average
Adoption: B+Quality: B+Freshness: CCitations: AEngagement: F

Specifications

License
CC-BY-4.0
Pricing
open-source
Capabilities
caption-generation, image-text-alignment, concept-grounding
Integrations
hugging-face
Use Cases
vision-language-pretraining, caption-generation, research
API Available
Yes
Tags
nlp, captions, image-text, multilingual, clip
Added
2026-03-17
Completeness
100%

Index Score

66.3
Adoption
74
Quality
76
Freshness
45
Citations
86
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service