Best Talks at International Conference on Machine Learning 2025

Editor's picks · 12 talks

Hand-picked from in-depth reviewer verdicts. View all talks at International Conference on Machine Learning 2025 →

1. Game-theoretic Statistics and Sequential Anytime-Valid Inference: Game-theoretic Statistics and Sequential Anytime-Valid Inference (SAVI): A Martingale Theory of Evidence — Aaditya Ramdas
Aaditya Ramdas’s tutorial at ICML 2025 introduced attendees to the rapidly evolving field of **game-theoretic statistics** and **Sequential Anytime-Valid Inference (SAVI)**, presenting it as a foundational shift in how we approach…
2. An analytic theory of creativity in convolutional diffusion models — Mason Kamb, Surya Ganguli
This talk, presented by Mason Kamb and Surya Ganguli at ICML 2025, introduces a groundbreaking analytic theory aimed at explaining the origins of combinatorial creativity and spatial consistency failures in convolutional diffusion models…
3. Emergence in non-neural models: grokking modular arithmetic via average gradient outer product — Neil Mallinar, Daniel Beaglehole, Libin Zhu, Adityanarayanan Radhakrishnan, Parthe Pandit, Misha Belkin
In this compelling talk, Neil Mallinar and his co-authors present groundbreaking research challenging the conventional understanding of generalization in machine learning, particularly the phenomenon known as **grokking**. Traditionally…
4. How Do Large Language Monkeys Get Their Power (Laws)? — Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo
This talk, presented by Rylan Schaeffer and Joshua Kazdan at ICML 2025, delves into the fascinating and seemingly paradoxical scaling laws observed when using **Large Language Models (LLMs)**, affectionately termed "large language…
5. Near-Optimal Decision Trees in a SPLIT Second — Varun Babbar, Hayden McTavish, Cynthia Rudin, Margo Seltzer
This article delves into the groundbreaking work presented by Varun Babbar and Hayden McTavish at ICML 2025, detailing their paper "Near-Optimal Decision Trees in a SPLIT Second." The talk introduces **SPLIT** and **Re-SPLIT**, two novel…
6. Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks — Shikai Qiu, Lechao Xiao, Andrew Wilson, Jeffrey Pennington, Atish Agarwala
This article delves into a groundbreaking discovery presented at ICML 2025 by Shikai Qiu and collaborators from Google DeepMind and New York University, titled "Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained…
7. Statistical Query Hardness of Multiclass Linear Classification with Random Classification Noise — Ilias Diakonikolas, Mingchen Ma, Lisheng Ren, Christos Tzamos
This talk delves into the computational complexity of **multiclass linear classification (MLC)**, a fundamental problem in machine learning, particularly when faced with **random classification noise (RCN)**. Multiclass linear…
8. Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions — Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham Kakade, Sitan Chen
This talk, presented by Kulin Shah at ICML 2025, delves into the fundamental mechanisms and challenges of **Masked Diffusion Models (MDMs)**, particularly concerning their approach to token ordering in language modeling. The work, a…
9. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift — Sergey Ioffe, Christian Szegedy
This article delves into the transformative impact of **Batch Normalization (BN)**, a technique that earned Sergey Ioffe and Christian Szegedy the prestigious ICML 2025 Test of Time Award. Presented by Ioffe, this talk offers a…
10. Harnessing Low Dimensionality in Diffusion Models: From Theory to Practice: Lecture I: The Generalizability of Diffusion Models — Qing Qu, Yuxin Chen, Liyue Shen
This article delves into the foundational mathematical aspects of **diffusion models**, specifically focusing on their remarkable **generalization** capabilities. Presented as the first lecture in a tutorial series titled "Harnessing Low…
11. Harnessing Low Dimensionality in Diffusion Models: From Theory to Practice: Lecture II: Sampling Theory for Diffusion Models — Qing Qu, Yuxin Chen, Liyue Shen
This talk, the second lecture in a comprehensive tutorial on diffusion models, delves into the intricate mathematical foundations governing the **sampling stage** of these powerful generative models. Presented by Yuxin Chen, with…
12. The Underlying Logic of Language Models: The Underlying Logic of Language Models: Transformers and Automata — Jiaoda Li, Ryan Cotterell, Franz Nowak, Anej Svete
This talk delves into the fascinating intersection of modern deep learning architectures, specifically **Transformers**, and classical **automata theory** and **algebraic automata theory**. Presented by Jiaoda Li, Ryan Cotterell, Franz…

View all talks at International Conference on Machine Learning 2025