Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent

Santhosh Karnik, Anna Veselovska, Mark Iwen, Felix Krahmer

International Conference on Machine Learning 2025 · Oral

This talk, presented at ICML 2025 by a collaborative team including Santhosh Karnik, Anna Veselovska, Mark Iwen, and Felix Krahmer, delves into one of the most fundamental and persistent theoretical puzzles in modern machine learning: the remarkable efficacy of **gradient descent (GD)**. Specifically, the researchers address why GD, despite operating on highly non-convex loss landscapes and in heavily over-parameterized models, consistently manages to converge to solutions that generalize exceptionally well to unseen data. The focus of this investigation is not on the well-studied "lazy training" or **Neural Tangent Kernel (NTK)** regime, but rather on the more intricate scenario of **small random initializations**, where the model parameters undergo significant changes during training.

AI review

A theoretically motivated extension of implicit regularization results from matrix factorization to tubal tensor factorizations under gradient descent with small random initialization. The work sits in a well-defined and legitimate lineage — Gunasekar et al. on matrix factorization, Li et al. on Hadamard products — and the research question is the right one to ask. However, based on what the article actually conveys, the contribution reads as a technically competent generalization rather than a conceptually transformative one. The article itself is frustratingly thin on the actual theorems…