A diffusion model generates images by starting with random noise and iteratively denoising over many steps until a coherent image emerges. It is the architecture behind Stable Diffusion, Midjourney, Flux, DALL-E, and most modern image generators.
Diffusion is conceptually different from the autoregressive token-by-token approach used by LLMs, but the same underlying transformer architecture is often used inside the denoising step. Video diffusion (Sora, Runway, Kling) extends the same idea to time-coherent frame sequences.