General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization

Kwangjun Ahn, Gagik Magakyan, Ashok Cutkosky

International Conference on Machine Learning 2025 · Oral

This article delves into a significant theoretical advancement in the field of optimization for machine learning, presented at ICML 2025. The talk, titled "General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization," by Gagik Magakyan, Kwangjun Ahn, and Ashok Cutkosky, introduces a novel theoretical framework that extends the applicability and understanding of **schedule-free Stochastic Gradient Descent (SGD)**. Traditionally, neural network training heavily relies on carefully tuned learning rate schedules, which are notoriously difficult to optimize and often require pre-specification for the entire training budget. Schedule-free methods aim to alleviate this burden by eliminating the need for such schedules, making optimization more robust and user-friendly.

AI review

A rigorous theoretical contribution that fills a genuine gap: prior work on schedule-free SGD had convergence guarantees only in the convex regime, leaving practitioners to use empirical heuristics for a fundamentally nonconvex world. This paper provides a generalized online-to-nonconvex conversion framework and proves that schedule-free SGD achieves optimal rates for nonconvex, nonsmooth problems, with the added payoff of a principled explanation for why kappa must be near one. The framework is intellectually clean — the connection between OMD and schedule-free SGD emerges naturally from…