LoRA Training Provably Converges to a Low-Rank Global Minimum Or It Fails Loudly (But it Probably Won't Fail)
Junsu Kim, Jaeyeon Kim, Ernest Ryu
International Conference on Machine Learning 2025 · Oral
Low-Rank Adaptation (LoRA) has emerged as a cornerstone technique for parameter-efficient fine-tuning of large pre-trained models. By introducing low-rank updates to specific layers, LoRA significantly reduces the number of trainable parameters, enabling more efficient adaptation to downstream tasks without sacrificing performance. This talk, presented by Ernest Ryu at ICML 2025, delves into the theoretical underpinnings of LoRA, offering a rigorous analysis of its convergence properties. The central thesis is encapsulated in its provocative title: LoRA training provably converges to a low-rank global minimum, or it fails loudly, though the latter is argued to be highly improbable in practice.
AI review
This is a legitimate theoretical contribution to the foundations of LoRA fine-tuning. The central result — that second-order stationary points of the LoRA objective are either low-rank global minima or high-rank, large-magnitude spurious minima — is a meaningful dichotomy theorem proved without NTK linearization, replacing it with restricted strong convexity and smoothness conditions borrowed from the matrix sensing literature. The nuclear norm equivalence of per-factor weight decay is clean and non-trivial in this context. The honest gap — that 'it probably won't fail' is argued…