What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Diffusion Model

A class of generative AI models that learn to reverse a gradual noise-addition process, enabling the generation of high-quality images, audio, and video from random noise guided by text or other conditioning signals.

7 min readLast updated May 2026Foundations

Diffusion models are a family of generative machine learning models that learn to synthesise realistic data — most notably images, but also audio, video, and three-dimensional structures — by training a neural network to reverse a gradual noise corruption process. The model learns the reverse of a diffusion process: starting from pure random noise, it progressively removes noise over many steps until a coherent data sample emerges. By conditioning this denoising process on a text description, an image, or other inputs, diffusion models can generate outputs that closely match the specified conditions. Since 2022, diffusion models have supplanted earlier generative approaches such as generative adversarial networks (GANs) as the dominant architecture in high-fidelity image generation.[^1]

The Forward Diffusion Process

Training a diffusion model begins with data — typically images from a large corpus — and the definition of a forward process that gradually corrupts that data over a series of timesteps T. At each timestep t, Gaussian noise is added to the image according to a predetermined noise schedule, which specifies how much noise is added at each step. After a sufficiently large number of steps (typically 1,000), the original image has been corrupted to a state of pure Gaussian noise, statistically indistinguishable from random static.

The forward process is not learned; it is a fixed mathematical operation. It exists to generate training examples: pairs of a noisy image at timestep t and the corresponding noise that was added to reach that state.

The Reverse Denoising Process

The neural network in a diffusion model is trained to predict the noise component in a noisy image at any given timestep, or equivalently to predict the original clean image. Given a noisy image at timestep t, the network estimates what the image would look like after removing the noise added in step t, producing a slightly cleaner image. Repeating this denoising operation across all timesteps — from t=T (pure noise) to t=0 — produces a complete, clean generated sample.

The neural network architecture used for this denoising task has evolved. Early diffusion models used a U-Net, a convolutional architecture with skip connections between encoder and decoder layers, which proved effective at capturing multi-scale image structure. Stable Diffusion 3 (2024) and subsequent models replaced the U-Net with a Diffusion Transformer (DiT), applying self-attention across the spatial tokens of a noisy latent representation.[^2]

Latent Diffusion Models

Running the denoising process directly in pixel space is computationally expensive for high-resolution images. Latent diffusion models (LDMs), introduced by Rombach et al. in 2022 and commercialised as Stable Diffusion, address this by performing the diffusion process in a compressed latent space rather than in pixel space.[^3]

A separate encoder (typically a variational autoencoder, or VAE) compresses the input image into a lower-dimensional latent representation. The diffusion process is applied to this latent representation, and a decoder reconstructs the final image from the denoised latent vector. Operating in latent space reduces computational requirements by an order of magnitude, enabling generation at a fraction of the cost of pixel-space diffusion and making the technique practical on consumer GPUs.

Text Conditioning

The ability to generate images from text descriptions is achieved through conditioning the denoising process on a text embedding. The text prompt is encoded using a pre-trained language model or CLIP text encoder, producing a vector representation of the text semantics. This vector is injected into the denoising network at each timestep — typically via cross-attention layers — allowing the network to generate images that align with the specified description.

Classifier-free guidance (CFG), introduced by Ho and Salimans in 2022, is a technique that amplifies text conditioning without a separate classifier network. The model is trained to denoise both with and without the text condition, and at inference time the output is interpolated between the conditional and unconditional predictions with a guidance scale parameter. Higher guidance scales produce images that more closely match the text description at the cost of some diversity.

Notable Implementations

Stable Diffusion is an open-source latent diffusion model developed by Stability AI in collaboration with CompVis and Runway ML, released in August 2022. Its open-source nature enabled a large ecosystem of fine-tuned models, custom lora adapters, and derivative tools. Stable Diffusion 3 (2024) introduced the Multimodal Diffusion Transformer (MMDiT) architecture and flow matching training, substantially improving text rendering and compositional accuracy.

DALL-E 3 (2023), developed by OpenAI, is tightly integrated with ChatGPT and offers high caption-following fidelity, meaning the generated image closely matches the description. It uses a modified latent diffusion architecture conditioned on GPT-4-generated extended captions rather than the raw user prompt.

Midjourney is a commercial image generation service with proprietary architecture, known for aesthetically stylised outputs and a strong community of creative users.

Sora (2024), developed by OpenAI, extends diffusion principles to video generation, operating on spacetime patches of compressed video representations to produce coherent videos up to a minute in length from text descriptions.

Beyond images, diffusion models have been applied to audio generation (Stable Audio, AudioLDM), protein structure generation (RFDiffusion from David Baker's lab), and 3D object generation.

Malaysian Context — Diffusion Models in Creative and Commercial Industries

Diffusion models have entered Malaysian creative and commercial industries through both direct tool use and API integration. The primary access pathway has been cloud-based services: OpenAI's DALL-E API (available through Microsoft Azure's Malaysia region), Stability AI's API, and consumer platforms such as Adobe Firefly and Canva's AI image generation feature.

Malaysian advertising agencies, media production companies, and digital marketing firms have incorporated text-to-image diffusion tools into creative workflows. Companies such as Naga DDB Tribal, TBWA Malaysia, and independent digital studios have used diffusion-generated imagery for concept visualisation, social media content, and marketing campaign assets, reducing the time from brief to visual concept from days to hours.

The fashion and retail sector, including Malaysian online retailers on Lazada and Shopee Malaysia, has begun using diffusion models for product photography augmentation — generating lifestyle backgrounds for product images without full photoshoots — and for visualising apparel designs before physical prototyping.

Malaysia's game development community, supported by the Malaysia Digital Content Industry (MDCI) programme under MDEC, has explored diffusion models for concept art generation and texture synthesis, accelerating pre-production for studios such as Streamline Studios Kuala Lumpur and smaller independent developers.

Regulatory and ethical considerations around diffusion-generated content are emerging. The Malaysia Communications and Multimedia Commission (MCMC) has noted the potential for AI-generated synthetic media to be used for disinformation and impersonation, and the broader AI Governance Framework addresses authenticity and disclosure obligations for synthetic media. Malaysia's PDPA does not yet specifically address synthetic image generation, but Cybersecurity Malaysia has flagged deepfake images and videos as a growing cybersecurity threat.

The open-source Stable Diffusion ecosystem — including community fine-tuned models with Southeast Asian and Malaysian cultural aesthetics — has attracted engagement from Malaysian AI enthusiasts and researchers. Groups operating within the Kuala Lumpur AI and data science community have published LoRA fine-tunes targeting Malay cultural motifs, traditional textile patterns, and local architectural styles.

References

Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems, 33.
Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., Podell, D., Dockhorn, T., English, Z., Lacey, K., Goodwin, A., Rombach, R. (2024). Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. Stability AI / Stability AI Technical Report.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. CVPR 2022.
Ho, J., & Salimans, T. (2022). Classifier-Free Diffusion Guidance. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.

Tags:diffusion model image generation stable diffusion DALL-E generative AI

Type	Generative neural network model
Introduced	2015 (Sohl-Dickstein et al.); refined 2020–2022
Notable implementations	Stable Diffusion, DALL-E 2/3, Midjourney, Sora
Key modalities	Image, video, audio, 3D
Related	Generative adversarial network, VAE, transformer architecture

The Forward Diffusion Process

The Reverse Denoising Process

Latent Diffusion Models

Text Conditioning

Notable Implementations

See Also

References

References