CC12M (Conceptual 12M)
by Google · open-source · Last verified 2026-03-17
A dataset of 12 million image-text pairs harvested from the web by Google, designed as a larger successor to Conceptual Captions (CC3M) with a relaxed filtering pipeline to increase scale while maintaining reasonable quality. CC12M has been widely used for pretraining vision-language models on diverse visual concepts and for studying the trade-offs between dataset scale and quality.
https://huggingface.co/datasets/laion/conceptual-captions-12m-webdataset ↗B
B—Above Average
Adoption: B+Quality: B+Freshness: CCitations: AEngagement: F
Specifications
- License
- Custom
- Pricing
- open-source
- Capabilities
- vision-language-pretraining, image-captioning, concept-learning
- Integrations
- hugging-face
- Use Cases
- vision-language-pretraining, image-captioning, research
- API Available
- Yes
- Tags
- multimodal, image-text, web-crawl, vision-language, pretraining
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
65.1Adoption
73
Quality
77
Freshness
43
Citations
82
Engagement
0
Put AI to work for your business
Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.