Skip to main content
brand
context
industry
strategy
AaaS
DatasetComputer Visionv1.0

CC12M (Conceptual 12M)

by Google · free · Last verified 2026-03-17

CC12M is a large-scale dataset by Google containing 12 million image-text pairs from the web. It was created with a less restrictive filtering process than its predecessor, CC3M, to achieve greater scale and diversity. This makes it a foundational resource for pretraining large vision-language models like CLIP and ALIGN.

https://huggingface.co/datasets/laion/conceptual-captions-12m-webdataset
B
BAbove Average
Adoption: B+Quality: B+Freshness: CCitations: AEngagement: F

Specifications

License
Custom
Pricing
free
Capabilities
vision-language-pretraining, zero-shot-image-classification, text-to-image-retrieval, image-to-text-retrieval, visual-question-answering-pretraining, image-captioning-model-training, visual-concept-learning, benchmarking-foundation-models
Integrations
[object Object], [object Object], [object Object], [object Object]
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
Yes
Tags
multimodal, image-text, web-crawl, vision-language, pretraining, large-scale-dataset, google-research, alt-text, noisy-data, foundation-models, zero-shot-learning
Added
2026-03-17
Completeness
0.9%

Index Score

65.1
Adoption
73
Quality
77
Freshness
43
Citations
82
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service