Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness

Konstantinos Oikonomidis, Jan Quan, Emanuel Laude, Panagiotis Patrinos

International Conference on Machine Learning 2025 · Oral

This talk, presented by Konstantinos Oikonomidis and his co-authors Jan Quan, Emanuel Laude, and Panagiotis Patrinos, introduces a novel framework for **nonlinearly preconditioned gradient methods** designed to tackle unconstrained minimization problems. While standard gradient descent (GD) is a ubiquitous solver, its optimal performance often hinges on the restrictive assumption of global **Lipschitz smoothness** of the cost function. In the context of modern machine learning, this assumption frequently does not hold, leading to challenges such as arduous hyperparameter tuning, requirement for excessively small step sizes, or reliance on computationally intensive line search procedures for convergence.

AI review

A rigorous theoretical contribution that formalizes a unified framework for nonlinearly preconditioned gradient methods under anisotropic smoothness — a generalization of Lipschitz smoothness that accommodates many practical non-smooth ML objectives. The work provides clean derivations from the majorization-minimization principle, proves convergence in both convex and non-convex settings, and gives a satisfying theoretical account of why gradient clipping and simplified Adam-type methods work. The framework is more than a rebranding exercise: the connection to phi-convexity and optimal…