DatasetLLMsv1.0

LAION-400M Text Captions

by LAION · free · Last verified 2026-03-17

The text caption component of the LAION-400M dataset, offering 400 million English alt-text captions. These captions were scraped from the web and filtered using CLIP to ensure a minimum similarity to their corresponding images. The text is used independently for large-scale NLP and multimodal research.

https://laion.ai/blog/laion-400-open-dataset/ ↗

C—Below Average

Adoption: B+Quality: B+Freshness: CCitations: FEngagement: F

Specifications

License: CC-BY-4.0
Pricing: free
Capabilities: caption-generation, image-text-alignment, concept-grounding, large-scale-language-model-training, multimodal-model-pre-training, visual-question-answering-dataset-creation, zero-shot-classification-research, text-to-image-model-training
Integrations
Use Cases: [object Object], [object Object], [object Object], [object Object]
API Available: Yes
Tags: nlp, captions, image-text, multilingual, clip, large-scale, web-scraped, multimodal-research, dataset, natural-language-processing, computer-vision
Added: 2026-03-17
Completeness: 0.8%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service