Skip to main content
PaperComputer Visionv1.0

Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2)

by OpenAI · free · Last verified 2026-03-17

Presented DALL-E 2 (unCLIP), a hierarchical text-conditional image generation system using CLIP image embeddings as a prior and a diffusion decoder. The system achieves state-of-the-art photorealism and text-image alignment, substantially advancing the field of text-to-image synthesis.

https://arxiv.org/abs/2204.06125
B+
B+Good
Adoption: A+Quality: A+Freshness: B+Citations: A+Engagement: F

Specifications

License
Open Access
Pricing
free
Capabilities
text-to-image, image-editing, image-variation, inpainting
Integrations
openai-api
Use Cases
creative-content-generation, image-editing, design-prototyping
API Available
Yes
Tags
dall-e-2, text-to-image, diffusion, clip, generative-ai
Added
2026-03-17
Completeness
100%

Index Score

77.1
Adoption
90
Quality
93
Freshness
76
Citations
90
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service