AIWiki
Malaysia

DALL-E

DALL-E is a series of text-to-image generative AI models developed by OpenAI that create photorealistic and artistic images from natural language prompts using diffusion and language-vision alignment techniques.

6 min readLast updated May 2026Models

DALL-E is a family of text-to-image generative models developed by OpenAI, capable of producing detailed, photorealistic, and stylistically diverse images from natural language descriptions. Named as a portmanteau of the surrealist painter Salvador Dalí and the Pixar character WALL-E, the DALL-E model family was first announced in January 2021 and went through three major iterations before being succeeded by OpenAI's GPT Image series in 2025. DALL-E is widely credited with popularising text-to-image AI among the general public and accelerating commercial adoption of generative image technology across creative industries worldwide.

DALL-E (2021)

The original DALL-E was introduced in January 2021 and was built on a modified version of the GPT-3 language model. Rather than generating text tokens, it generated discrete image tokens — compact representations of image patches learned through a variational autoencoder. DALL-E demonstrated the ability to combine concepts in imaginative ways, rendering prompts such as "an armchair in the shape of an avocado" or "a snail made of harp". With 12 billion parameters, the original model was primarily a research demonstration and was not released publicly via API.

DALL-E 2 (2022)

Released in April 2022, DALL-E 2 was a significant architectural redesign. It adopted a diffusion model approach — the same generative framework later popularised by Stable Diffusion — combined with a CLIP-based text encoder that aligns language representations with image representations in a shared semantic space. DALL-E 2 produced higher-resolution images (up to 1024x1024 pixels) and introduced inpainting (editing specific regions of an image) and outpainting (extending an image beyond its original boundaries). A limited public beta launched in April 2022 with a broader API release in November 2022.

DALL-E 3 (2023)

DALL-E 3, announced in September 2023, represented a major improvement in prompt fidelity — the ability to render images that accurately reflect the detail and nuance of complex text descriptions. A key feature was native integration with ChatGPT, allowing users to iteratively refine images through conversational prompts rather than single-shot text inputs. DALL-E 3 also embedded C2PA (Coalition for Content Provenance and Authenticity) metadata watermarks in generated images, providing a mechanism to identify AI-generated content. DALL-E 3 was formally deprecated on 12 May 2026.

GPT Image and Succession

In March 2025, OpenAI launched GPT Image 1, a native image generation capability integrated directly into the GPT-4o model family. GPT Image 1 offered significant improvements in text rendering within images — a persistent weakness across all DALL-E versions — and tighter integration with language reasoning. GPT Image 2, released in late 2025, achieved near-perfect text rendering in English (99% accuracy) and strong multilingual text performance in Chinese, Japanese, Korean, Hindi, Bengali, and Arabic. Within days of launch, GPT Image 2 took the top position on major image generation leaderboards by a substantial margin, effectively completing the transition away from the DALL-E product line.

Technical Approach

DALL-E models combine two key technologies: a text understanding component (based on CLIP or a similar cross-modal encoder) and an image generation component (a diffusion model or, in the original version, a discrete image token model). The text encoder maps a prompt into a semantic embedding that guides the image generation process. At each diffusion step, the model iteratively refines a noisy image, conditioning on the text embedding to ensure the output corresponds to the description.

The quality and diversity of training data has been central to DALL-E's capabilities. OpenAI trained on hundreds of millions of image-caption pairs from the internet, allowing the model to learn visual-linguistic associations spanning art styles, technical diagrams, fantastical scenarios, and photorealistic scenes.

Impact and Criticism

DALL-E had a transformative effect on the creative technology landscape. Within months of DALL-E 2's release, text-to-image AI moved from academic novelty to mainstream consumer product, with competitors including Midjourney, Stability AI's Stable Diffusion, Adobe Firefly, and Google's Imagen entering the market rapidly.

The model family attracted substantial criticism on intellectual property grounds. The use of copyrighted images in training data without consent or compensation to original creators prompted lawsuits from artists and photographers across multiple jurisdictions. Ongoing policy debates in the United States, European Union, and other jurisdictions address whether training on copyrighted data constitutes fair use or requires licensing frameworks. Content filters and usage policies restricting certain categories of output — public figures, trademarked characters, violent imagery — represent OpenAI's partial response to these concerns, though critics argue they are insufficient.

References

  1. Ramesh, A., et al. (2021). Zero-Shot Text-to-Image Generation (DALL-E). arXiv:2102.12092.
  2. Ramesh, A., et al. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2). arXiv:2204.06125.
  3. OpenAI. (2023). DALL-E 3 System Card. OpenAI Research.
  4. OpenAI. (2025). GPT Image 1 launch announcement. OpenAI Blog, March 2025.