PaperComputer Visionv1.0

Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2)

by OpenAI · free · Last verified 2026-03-17

Presented DALL-E 2 (unCLIP), a hierarchical text-conditional image generation system using CLIP image embeddings as a prior and a diffusion decoder. The system achieves state-of-the-art photorealism and text-image alignment, substantially advancing the field of text-to-image synthesis.

https://arxiv.org/abs/2204.06125 ↗

C+

C+—Average

Adoption: A+Quality: A+Freshness: B+Citations: FEngagement: F

Specifications

License: Open Access
Pricing: free
Capabilities: text-to-image, image-editing, image-variation, inpainting
Integrations: openai-api
Use Cases: creative-content-generation, image-editing, design-prototyping
API Available: Yes
Tags: dall-e-2, text-to-image, diffusion, clip, generative-ai
Added: 2026-03-17
Completeness: 100%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service