Transformative or Conservative? Conservation laws for ResNets and Transformers
Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré
International Conference on Machine Learning 2025 · Oral
This detailed technical article explores the groundbreaking work presented at ICML 2025 by Sibylle Marcotte, Rémi Gribonval, and Gabriel Peyré on **conserved quantities** during the training dynamics of neural networks, with a particular focus on **ResNets** and **Transformers**. The talk delves into the fundamental mathematical properties that remain invariant as a neural network undergoes training via **gradient flow**, and how these properties translate to the more practical **Stochastic Gradient Descent (SGD)** setting. By identifying and characterizing these conservation laws, the research provides crucial insights into the **implicit bias** of training dynamics and offers powerful mathematical tools for analyzing the convergence and behavior of complex deep learning models.
AI review
Marcotte, Gribonval, and Peyré present a rigorous and technically substantial characterization of conservation laws in the gradient flow training dynamics of modern deep learning architectures — CNNs, attention layers, ResNets, and Transformers. The central contribution is an exhaustive identification of conserved quantities for shallow modules and a non-obvious result that skip connections do not expand the space of conservation laws. The theoretical framework is clean, the connection to Lie algebra methodology is principled, and the SGD approximation result is honestly bounded. This is…