The Underlying Logic of Language Models: The Underlying Logic of Language Models: Transformers and Automata
Jiaoda Li, Ryan Cotterell, Franz Nowak, Anej Svete
International Conference on Machine Learning 2025 · Tutorial
This talk delves into the fascinating intersection of modern deep learning architectures, specifically **Transformers**, and classical **automata theory** and **algebraic automata theory**. Presented by Jiaoda Li, Ryan Cotterell, Franz Nowak, and Anej Svete at ICML 2025, the session explores how Transformers, despite their parallel processing nature, can effectively approximate and even implement formalisms traditionally associated with recurrent processes. The central theme revolves around **structural decomposition**, a powerful analytical technique where complex mathematical objects are broken down into simpler, understandable components. This approach is applied to Transformers to uncover their underlying computational logic, drawing parallels to how integers are decomposed into prime numbers.
AI review
A rigorous theoretical talk connecting Transformer expressiveness to the Krohn-Rhodes decomposition of finite automata, establishing that log-depth Transformers can recognize all regular languages and constant-depth Transformers can recognize solvable languages, with precise architectural correspondences between attention, MLPs, and residual connections on one side and resets, cyclic groups, and cascade products on the other. The work is technically honest about what is proven versus assumed, situates itself correctly within the ACC0/circuit-complexity literature, and delivers a clean…