DatasetComputer Visionv1.0

CC12M (Conceptual 12M)

by Google · free · Last verified 2026-03-17

CC12M is a large-scale dataset by Google containing 12 million image-text pairs from the web. It was created with a less restrictive filtering process than its predecessor, CC3M, to achieve greater scale and diversity. This makes it a foundational resource for pretraining large vision-language models like CLIP and ALIGN.

https://huggingface.co/datasets/laion/conceptual-captions-12m-webdataset ↗

B—Above Average

Adoption: B+Quality: B+Freshness: CCitations: AEngagement: F

Specifications

License: Custom
Pricing: free
Capabilities: vision-language-pretraining, zero-shot-image-classification, text-to-image-retrieval, image-to-text-retrieval, visual-question-answering-pretraining, image-captioning-model-training, visual-concept-learning, benchmarking-foundation-models
Integrations: [object Object], [object Object], [object Object], [object Object]
Use Cases: [object Object], [object Object], [object Object], [object Object]
API Available: Yes
Tags: multimodal, image-text, web-crawl, vision-language, pretraining, large-scale-dataset, google-research, alt-text, noisy-data, foundation-models, zero-shot-learning
Added: 2026-03-17
Completeness: 0.9%

Index Score

65.1

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service