Learning Transferable Visual Models From Natural Language Supervision (CLIP)
by OpenAI · open-source · Last verified 2026-03-17
Introduced CLIP (Contrastive Language-Image Pre-training), a model trained on 400 million image-text pairs using contrastive learning that achieves remarkable zero-shot transfer to diverse vision tasks. CLIP became foundational for vision-language alignment and generative AI pipelines.
https://arxiv.org/abs/2103.00020 ↗A
A—Great
Adoption: A+Quality: A+Freshness: B+Citations: A+Engagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- zero-shot-classification, image-text-matching, feature-extraction, retrieval
- Integrations
- huggingface, openai-api
- Use Cases
- zero-shot-image-classification, image-retrieval, vision-language-alignment
- API Available
- Yes
- Tags
- clip, contrastive-learning, zero-shot, multimodal, vision-language
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
82.2Adoption
97
Quality
96
Freshness
74
Citations
97
Engagement
0
Put AI to work for your business
Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.