In-Context Denoising with One-Layer Transformers: Connections between Attention and Associative Memory Retrieval

Matthew Smart, Alberto Bietti, Anirvan Sengupta

International Conference on Machine Learning 2025 · Oral

This talk, presented by Matthew Smart and his colleagues at the Flatiron Institute, delves into a novel "in-context denoising" task designed to bridge the theoretical gap between two seemingly distinct architectures: **transformer attention mechanisms** and **associative memory networks**. The core premise is that by reframing certain in-context learning (ICL) problems as denoising tasks, a profound connection emerges, revealing that a single layer of transformer attention can precisely execute the operations characteristic of a single gradient descent step within a modern Hopfield network, a type of associative memory.

AI review

Smart, Bietti, and Sengupta establish a clean theoretical connection between single-layer transformer attention and one step of gradient descent on an associative memory energy landscape, using a carefully constructed in-context denoising task as the interface. The core result — that optimal attention weights converge to scaled identity, and that this operation is exactly equivalent to a Hopfield gradient step — is stated precisely, empirically validated, and non-trivial. The paper earns a strong accept on the strength of its theoretical architecture and the quality of its empirical…