Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Shikai Qiu, Lechao Xiao, Andrew Wilson, Jeffrey Pennington, Atish Agarwala

International Conference on Machine Learning 2025 · Oral

This article delves into a groundbreaking discovery presented at ICML 2025 by Shikai Qiu and collaborators from Google DeepMind and New York University, titled "Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks." The talk introduces the phenomenon of "scaling collapse," a novel approach to understanding and predicting the full training dynamics of large-scale machine learning models. The core idea is that when normalized appropriately, the loss curves of neural networks trained across vastly different scales collapse onto a single, universal trajectory, revealing a profound underlying predictability in complex training processes.

AI review

This paper presents a genuinely interesting empirical and partially theoretical result: that compute-optimally trained neural networks, when their loss curves are normalized by final loss and total compute, collapse onto a single universal trajectory with precision that beats the noise floor from random seed variation. The phenomenon is demonstrated across architectures and datasets, and the ablations are well-designed — showing that muP and Chinchilla-optimality are necessary conditions for collapse. A lightweight theoretical model, borrowing SGD-on-quadratics intuition to decompose loss…