An analytic theory of creativity in convolutional diffusion models

Mason Kamb, Surya Ganguli

International Conference on Machine Learning 2025 · Oral

This talk, presented by Mason Kamb and Surya Ganguli at ICML 2025, introduces a groundbreaking analytic theory aimed at explaining the origins of combinatorial creativity and spatial consistency failures in convolutional diffusion models. The central premise is that these seemingly disparate phenomena—the ability of models like DALL-E 2 to blend concepts like "Napoleon cat" and "nebula dog," alongside common artifacts such as extra limbs or distorted features—stem from a single underlying mechanism. This mechanism is rooted in the fundamental architectural constraints of convolutional neural networks (CNNs), specifically **translational equivariance** and **locality**.

AI review

Kamb and Ganguli present a genuine analytic theory — not a narrative dressed in theorem notation — for why convolutional diffusion models exhibit combinatorial creativity and spatial consistency failures. The central result is clean: under translational equivariance and locality, the Bayes optimal denoiser factors into independent local posteriors, each performing its own uncoordinated belief update over training patches. This is a mechanistically honest explanation for phenomena the community has been documenting empirically without understanding. The 90%+ quantitative agreement between…