International Conference on Machine Learning 2025

Name: International Conference on Machine Learning 2025
Start: July 13-19, 2025
Location: Vancouver, Canada

July 13-19, 2025 · Vancouver, Canada · 141 talks

The 42nd International Conference on Machine Learning. Main-program talks spanning invited keynotes, tutorials, oral presentations, and Test of Time awards covering the frontier of machine learning research.

→ See editor’s top picks at International Conference on Machine Learning 2025

AI's Models of the World, and Ours — Jon Kleinberg
Jon Kleinberg, a renowned professor from Cornell University, delivered a thought-provoking talk at ICML 2025, delving into the profound implications of AI's burgeoning capabilities, particularly its…
Adaptive Alignment: Designing AI for a Changing World - Frauke Kreuter — Frauke Kreuter
In an insightful talk at ICML 2025, social scientist Frauke Kreuter presented a compelling argument for a more nuanced and data-centric approach to **AI alignment**, emphasizing the critical…
Closing the Loop: Machine Learning for Optimization and Discovery — Andreas Krause
In this insightful talk at ICML 2025, Andreas Krause, a leading researcher in machine learning, presented a compelling vision for "Closing the Loop: Machine Learning for Optimization and Discovery."…
Generative AI's Collision with Copyright Law — Pamela Samuelson
Pamela Samuelson, a distinguished legal scholar, delivered a compelling talk at ICML 2025, dissecting the intricate and often contentious relationship between generative AI technologies and…
What to optimize for – from robot arms to frontier AI - Anca Dragan — Anca Dragan
In this insightful keynote at ICML 2025, Anca Dragan, who co-leads post-training for Gemini at Google DeepMind, delves into one of the most fundamental and persistent challenges in AI: understanding…
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift — Sergey Ioffe, Christian Szegedy
This article delves into the transformative impact of **Batch Normalization (BN)**, a technique that earned Sergey Ioffe and Christian Szegedy the prestigious ICML 2025 Test of Time Award. Presented…
Calibration and Bias in Algorithms, Data, and Models: a tutorial on metrics and plots for measuring calibration, bias, fairness, reliability, and robustness: Calibration and Bias in Algorithms, Data, and Models: a tutorial on metrics and plots for measuring calibration, bias, fairness, reliability, and robustness — Mark Tygert
In this comprehensive tutorial at ICML 2025, Mark Tygert delivers a critical examination of widely used methods for measuring calibration, bias, fairness, reliability, and robustness in machine…
DP-fy your DATA: How to (and why) synthesize Differentially Private Synthetic Data: Brief introduction to DP — Natalia Ponomareva, Sergei Vassilvitskii, Peter Kairouz, Alex Bie
This talk, delivered at ICML 2025 by a team of prominent researchers, addresses the critical and escalating challenge of data privacy in the age of large-scale machine learning, particularly with…
DP-fy your DATA: How to (and why) synthesize Differentially Private Synthetic Data: Methods for DP synthetic TABULAR data — Natalia Ponomareva, Sergei Vassilvitskii, Peter Kairouz, Alex Bie
This talk, delivered by Natalia Ponomareva, delves into the intricate world of generating Differentially Private (DP) synthetic data, specifically focusing on image and tabular modalities. Building…
DP-fy your DATA: How to (and why) synthesize Differentially Private Synthetic Data: Practical components for DP synthetic data system — Natalia Ponomareva, Sergei Vassilvitskii, Peter Kairouz, Alex Bie
This talk, part of a broader tutorial on differentially private (DP) synthetic data, shifts focus from the algorithmic foundations to the critical system-level considerations required for deploying…
DP-fy your DATA: How to (and why) synthesize Differentially Private Synthetic Data: Techniques for DP synthetic IMAGE data creation — Natalia Ponomareva, Sergei Vassilvitskii, Peter Kairouz, Alex Bie
This talk, presented by Alex Bie and his colleagues Natalia Ponomareva, Sergei Vassilvitskii, and Peter Kairouz from Google, delves into the critical and evolving field of generating differentially…
DP-fy your DATA: How to (and why) synthesize Differentially Private Synthetic Data: Techniques for creating DP synthetic TEXT data — Natalia Ponomareva, Sergei Vassilvitskii, Peter Kairouz, Alex Bie
This talk, delivered by Natalia Ponomareva and her esteemed colleagues at ICML 2025, provides a foundational and critical exploration into the realm of **Differential Privacy (DP)**, specifically…
Flowing Through Continuous-Time Generative Models: A Clear and Systematic Tour: Flow Through Generative Modeling: A Tutorial — qiang liu
This tutorial, presented by Qiang Liu at ICML 2025, offers a comprehensive and systematic exploration of **continuous-time generative models**, with a particular focus on **Rectified Flow (RF)**…
Game-theoretic Statistics and Sequential Anytime-Valid Inference: Game-theoretic Statistics and Sequential Anytime-Valid Inference (SAVI): A Martingale Theory of Evidence — Aaditya Ramdas
Aaditya Ramdas’s tutorial at ICML 2025 introduced attendees to the rapidly evolving field of **game-theoretic statistics** and **Sequential Anytime-Valid Inference (SAVI)**, presenting it as a…
Generative AI Meets Reinforcement Learning: Generative AI Meets Reinforcement Learning — Amy Zhang, Benjamin Eysenbach
This comprehensive tutorial, "Generative AI Meets Reinforcement Learning," delivered by Amy Zhang and Benjamin Eysenbach at ICML 2025, explores the profound and often overlooked synergies between…
Harnessing Low Dimensionality in Diffusion Models: From Theory to Practice: Lecture I: The Generalizability of Diffusion Models — Qing Qu, Yuxin Chen, Liyue Shen
This article delves into the foundational mathematical aspects of **diffusion models**, specifically focusing on their remarkable **generalization** capabilities. Presented as the first lecture in a…
Harnessing Low Dimensionality in Diffusion Models: From Theory to Practice: Lecture II: Sampling Theory for Diffusion Models — Qing Qu, Yuxin Chen, Liyue Shen
This talk, the second lecture in a comprehensive tutorial on diffusion models, delves into the intricate mathematical foundations governing the **sampling stage** of these powerful generative…
Harnessing Low Dimensionality in Diffusion Models: From Theory to Practice: Lecture III: Diffusion Inverse Solvers for Scientific Applications — Qing Qu, Yuxin Chen, Liyue Shen
This article delves into the third lecture of a comprehensive tutorial on diffusion models, focusing specifically on their application as inverse solvers for scientific problems. Presented by Liyue…
Jailbreaking LLMs and Agentic Systems: Attacks, Defenses, and Evaluations: Jailbreaking LLMs and Agentic Systems: Attacks, Defenses, and Evaluations — Hamed Hassani, Amin Karbasi, Alexander Robey
This comprehensive tutorial, presented by Hamed Hassani, Amin Karbasi, and Alexander Robey at ICML 2025, delves into the critical and rapidly evolving landscape of **jailbreaking attacks** against…
Modern Methods in Associative Memory: A Universal Language for Associative Memory — Dmitry Krotov, Benjamin Hoover, Parikshit Ram
This talk, presented by Benjamin Hoover and Dmitry Krotov at ICML 2025, introduces a groundbreaking "universal language" for describing and building a wide array of associative memory (AM) models…
Modern Methods in Associative Memory: Modern Methods in Associative Memory — Dmitry Krotov, Benjamin Hoover, Parikshit Ram
This article delves into the "Modern Methods in Associative Memory" tutorial presented at ICML 2025 by Dmitry Krotov, Benjamin Hoover, and Parikshit Ram. The talk addresses a critical challenge…
Modern Methods in Associative Memory: Modern Methods in Associative Memory: AM and broader AI — Dmitry Krotov, Benjamin Hoover, Parikshit Ram
This talk, delivered by Parikshit Ram, delves into **associative memory (AM)** networks, reframing them not as abstract theoretical constructs from physics or neuroscience, but as tangible machine…
The Underlying Logic of Language Models: The Underlying Logic of Language Models: Introduction — Jiaoda Li, Ryan Cotterell, Franz Nowak, Anej Svete
This talk, an introductory segment of a broader tutorial presented at ICML 2025, delves into the fundamental question of "what language models can compute." Presented by Anej Svete, Jiaoda Li, and…
The Underlying Logic of Language Models: The Underlying Logic of Language Models: Transformers and Automata — Jiaoda Li, Ryan Cotterell, Franz Nowak, Anej Svete
This talk delves into the fascinating intersection of modern deep learning architectures, specifically **Transformers**, and classical **automata theory** and **algebraic automata theory**…
The Underlying Logic of Language Models: The Underlying Logic of Language Models: Transformers and Circuits — Jiaoda Li, Ryan Cotterell, Franz Nowak, Anej Svete
This talk, delivered by Anej Svete as part of a broader session with Jiaoda Li, Ryan Cotterell, and Franz Nowak at ICML 2025, delves into a foundational understanding of transformer-based language…
The Underlying Logic of Language Models: The Underlying Logic of Language Models: Transformers and Formal Logics — Jiaoda Li, Ryan Cotterell, Franz Nowak, Anej Svete
This talk, presented at ICML 2025 by Jiaoda Li and collaborators, delves into the fundamental expressive power of **Transformer** architectures by establishing formal equivalences with fragments of…
Training Neural Networks at Any Scale: Training Neural Networks at Any Scale — Leena Chennuru Vankadara, Volkan Cevher
This detailed technical article delves into the intricacies of training neural networks, particularly focusing on the challenges and opportunities presented by scaling models to unprecedented sizes…
Tutorial on Mechanistic Interpretability for Language Models: Tutorial on Mechanistic Interpretability for Language Models — Ziyu Yao, Daking Rai
This article delves into the intricate world of **Mechanistic Interpretability (MI)** for language models, based on a comprehensive tutorial presented by Ziyu Yao and Daking Rai at ICML 2025. The…
A Generalization Theory for Zero-Shot Prediction — Ronak Mehta, Zaid Harchaoui
In this insightful talk from ICML 2025, Ronak Mehta, in collaboration with his advisor Zaid Harchaoui, presents a groundbreaking theoretical framework for understanding the generalization…
A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization — Nuojin Cheng, Leonard Papenmeier, Stephen Becker, Luigi Nardi
This talk introduces a groundbreaking unified framework, **Variational Entropy Search (VES)**, that bridges the conceptual and practical gap between two of the most prominent acquisition functions…
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $\alpha$-$\beta$-Divergence — Guanghui Wang, Zhiyong Yang, Zitai Wang, Shi Wang, Qianqian Xu, Qingming Huang
This talk introduces **ABKD (Alpha-Beta Knowledge Distillation)**, a novel framework designed to enhance model compression through knowledge distillation by addressing the limitations of traditional…
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies — Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Gaurav Jain, Oren Pereg, Moshe Wasserblat, David Harel
Large Language Model (LLM) inference, particularly the **autoregressive decoding** process, remains a significant bottleneck in many applications. Each token generation typically requires a full…
AdaSplash: Adaptive Sparse Flash Attention — Nuno Gonçalves, Marcos V. Treviso, Andre Martins
The talk "AdaSplash: Adaptive Sparse Flash Attention" introduces a groundbreaking approach to enhance the efficiency and scalability of attention mechanisms in large-scale transformer models…
Addressing Misspecification in Simulation-based Inference through Data-driven Calibration — Antoine Wehenkel, Juan L. Gamella, Ozan Sener, Jens Behrmann, Guillermo Sapiro, Jörn Jacobsen, Marco Cuturi
This talk by Juan L. Gamella and Antoine Wehenkel, representing a collaborative effort from Apple, introduces **Warped Posterior Estimator (WORP)**, a novel framework designed to enhance…
Algorithm Development in Neural Networks: Insights from the Streaming Parity Task — Loek van Rossem, Andrew Saxe
In this insightful talk from ICML 2025, Loek van Rossem and Andrew Saxe delve into the fascinating phenomenon of how neural networks implicitly learn computational algorithms from training data…
All-Purpose Mean Estimation over R: Optimal Sub-Gaussianity with Outlier Robustness and Low Moments Performance — Jasper Lee, Walter McKelvie, Maoyuan Song, Paul Valiant
This talk, presented by Jasper Lee from UC Davis, delves into the foundational problem of **mean estimation** over the real numbers, a seemingly simple task that is, in fact, remarkably complex in…
An Improved Clique-Picking Algorithm for Counting Markov Equivalent DAGs via Super Cliques Transfer — Lifu Liu, Shiyuan He, Jianhua Guo
This talk introduces an innovative approach to a fundamental problem in causal inference: efficiently counting the number of **Markov Equivalent DAGs (MEC)**. Presented by Lifu Liu, Shiyuan He, and…
An analytic theory of creativity in convolutional diffusion models — Mason Kamb, Surya Ganguli
This talk, presented by Mason Kamb and Surya Ganguli at ICML 2025, introduces a groundbreaking analytic theory aimed at explaining the origins of combinatorial creativity and spatial consistency…
Auditing $f$-differential privacy in one run — Saeed Mahloujifar, Luca Melis, Kamalika Chaudhuri
In an era where state-of-the-art machine learning models are increasingly vulnerable to sophisticated privacy attacks, **differential privacy (DP)** has emerged as the gold standard for providing…
AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses — Nicholas Carlini, Edoardo Debenedetti, Javier Rando, Milad Nasr, Florian Tramer
In this insightful talk from ICML 2025, Nicholas Carlini and his co-authors present **AutoAdvExBench**, a novel, proxy-free benchmark designed to evaluate the capability of large language models…
AutoGFM: Automated Graph Foundation Model with Adaptive Architecture Customization — Haibo Chen, Xin Wang, Zeyang Zhang, Haoyang Li, Ling Feng, Wenwu Zhu
This article delves into "AutoGFM: Automated Graph Foundation Model with Adaptive Architecture Customization," a significant contribution presented at ICML 2025 by Haibo Chen and his co-authors. The…
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation — Tiansheng Wen, Yifei Wang, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, Chenyu You
This article delves into the ICML 2025 talk, "Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation," presented by Chenyu You, an Assistant Professor at Stony Brook, on behalf of…
Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs — Jie Hu, Yi-Ting Ma, Do-Young Eun
At ICML 2025, Jie Hu from NC State University, alongside colleagues Yi-Ting Ma and Professor Do-Young Eun, presented a significant advancement in the realm of Markov Chain Monte Carlo (MCMC)…
Blink of an eye: a simple theory for feature localization in generative models — Marvin Li, Aayush Karan, Sitan Chen
Marvin Li, Aayush Karan, and Sitan Chen presented a groundbreaking theoretical framework at ICML 2025 addressing the pervasive phenomenon of "critical windows" in generative AI models. This talk…
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark — Yunzhuo Hao, Jiawei Gu, Huichen Wang, Linjie Li, Zhengyuan Yang, Lijuan Wang, Yu Cheng
In an era where large language models (LLMs) are rapidly evolving into **multimodal large language models (MLLMs)**, their ability to process and generate information across various modalities—text…
CodeIO: Condensing Reasoning Patterns via Code Input-Output Prediction — Junlong Li, Daya Guo, Dejian Yang, Runxin Xu, Yu Wu, Junxian He
The talk introduces **CodeIO**, a novel framework designed to enhance the general reasoning capabilities of large language models (LLMs) by leveraging "code in the wild." Presented by a proxy on…
CollabLLM: From Passive Responders to Active Collaborators — Shirley Wu, Michel Galley, Baolin Peng, Hao Cheng, Gavin Li, Yao Dou, Weixin Cai, James Zou, Jure Leskovec, Jianfeng Gao
In an era where **Large Language Models (LLMs)** are increasingly integrated into daily workflows, from drafting documents to solving complex scientific problems, the quality of human-LLM…
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features — Alec Helbling, Tuna Han Salih Meral, Benjamin Hoover, Pinar Yanardag, Polo Chau
In this insightful talk from ICML 2025, Alec Helbling, a PhD student at Georgia Tech, introduced **ConceptAttention**, a novel method designed to visualize the textual concepts embedded within the…
Conformal Prediction as Bayesian Quadrature — Jake Snell, Thomas Griffiths
In this insightful talk from ICML 2025, Jake Snell, co-authored with Thomas Griffiths, presents a novel theoretical framework that bridges two seemingly disparate areas of machine learning…
Controlling Underestimation Bias in Constrained Reinforcement Learning for Safe Exploration — Shiqing Gao, Jiaxin Ding, Luoyi Fu, Xinbing Wang
This talk introduces a critical challenge in **Constrained Reinforcement Learning (CRL)**: the pervasive issue of cost underestimation bias, which leads to unsafe exploration in safety-critical…
Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination — Kunal Jha, Wilka Carvalho, Yancheng Liang, Simon Du, Max Kleiman-Weiner, Natasha Jaques
This talk, presented by Kunal Jha at ICML 2025, introduces **Cross-Environment Cooperation (CEC)**, a novel paradigm designed to tackle the formidable challenge of **zero-shot multi-agent…
DeFoG: Discrete Flow Matching for Graph Generation — Yiming Qin, Manuel Madeira, Dorina Thanou, Pascal Frossard
The talk "DeFoG: Discrete Flow Matching for Graph Generation," presented by Manuel Madeira and Yiming Qin from EPFL, introduces a novel and highly flexible framework for generating graphs that…
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs — Jongwoo Ko, Tianyi Chen, Sungnyun Kim, Tianyu Ding, Luming Liang, Ilya Zharkov, Se-Young Yun
In an era increasingly defined by the capabilities of large language models (LLMs), the computational demands associated with their deployment remain a significant bottleneck. This talk introduces…
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents — Rui Yang, Hanyang(Jeremy) Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Koripella, Marziyeh Movahedi, Manling Li, Heng Ji, Huan Zhang, Tong Zhang
The talk introduces **EmbodiedBench**, a novel and comprehensive benchmarking suite designed to evaluate multi-modal large language models (VLMs) for their capabilities in vision-driven embodied…
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product — Neil Mallinar, Daniel Beaglehole, Libin Zhu, Adityanarayanan Radhakrishnan, Parthe Pandit, Misha Belkin
In this compelling talk, Neil Mallinar and his co-authors present groundbreaking research challenging the conventional understanding of generalization in machine learning, particularly the…
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs — Jan Betley, Daniel Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans
This talk, presented by Niels Warncke at ICML 2025, unveils a critical and surprising phenomenon dubbed **"Emergent Misalignment."** The core discovery is that finetuning large language models…
Equivalence is All: A Unified View for Self-supervised Graph Learning — Yejiang Wang, Yuhai Zhao, Zhengkui Wang, Ling Li, Jiapu Wang, Fangting Li, Miaomiao Huang, Shirui Pan, Xingwei Wang
This talk introduces **GALE (Graph Automorphic Equivalence Learning)**, a novel self-supervised learning framework for graph-structured data that unifies and leverages the fundamental concept of…
Expected Variational Inequalities — Brian Zhang, Ioannis Anagnostides, Emanuel Tewolde, Ratip Emin Berker, Gabriele Farina, Vincent Conitzer, Tuomas Sandholm
Variational Inequalities (VIs) represent a fundamental and highly expressive framework in mathematics and computer science, capable of modeling a vast array of problems from optimization to game…
Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards — Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos, Nicholas Carlini, Wei-Lin Chiang, Christopher A. Choquette Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Ken Ziyu Liu, Ion Stoica, Florian Tramer, Chiyuan Zhang
This talk, presented by Christopher Choquette Choo on behalf of lead author Yangsibo Huang and a large team of collaborators, delves into the vulnerabilities inherent in **voting-based benchmarks**…
Flowing Datasets with Wasserstein over Wasserstein Gradient Flows — Clément Bonet, Christophe Vauthier, Anna Korba
This talk introduces a novel and computationally efficient framework for comparing and transforming labeled datasets, a critical challenge in modern machine learning. Presented by Clément Bonet…
Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection — Zhijing Wan, Zhixiang Wang, Zheng Wang, Xin Xu, Shin'ichi Satoh
In an era defined by the exponential growth of data, the deep learning paradigm has witnessed unprecedented advancements. However, the prevailing wisdom that "more data equals better performance" is…
From Weight-Based to State-Based Fine-Tuning: Further Memory Reduction on LoRA with Parallel Control — Chi Zhang, REN Lianhai, Jingpu Cheng, Qianxiao Li
This talk, presented by Chi Zhang and co-authors REN Lianhai, Jingpu Cheng, and Qianxiao Li from the National University of Singapore, introduces a novel perspective on **Parameter-Efficient…
Fully Dynamic Euclidean Bi-Chromatic Matching in Sublinear Update Time — Gramoz Goranci, Peter Kiss, Neel Patel, Martin Seybold, Eva Szilagyi, Da Wei Zheng
This talk, presented at ICML 2025 by a collaborative team including Gramoz Goranci, Peter Kiss, Neel Patel, Martin Seybold, Eva Szilagyi, and Da Wei Zheng, addresses the challenging problem of…
Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton — Chengmei Niu, Zhenyu Liao, Zenan Ling, Michael Mahoney
This talk, presented by Liam Hutchinson on behalf of authors Chengmei Niu, Zhenyu Liao, Zenan Ling, and Michael Mahoney, addresses a critical theoretical and practical challenge in **randomized…
General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization — Kwangjun Ahn, Gagik Magakyan, Ashok Cutkosky
This article delves into a significant theoretical advancement in the field of optimization for machine learning, presented at ICML 2025. The talk, titled "General framework for online-to-nonconvex…
Generative Social Choice: The Next Generation — Niclas Boehmer, Sara Fish, Ariel Procaccia
In an era increasingly shaped by digital discourse and the proliferation of diverse viewpoints, the challenge of accurately and proportionally summarizing collective opinion is paramount. This talk…
Hierarchical Refinement: Optimal Transport to Infinity and Beyond — Peter Halmos, Julian Gold, Xinhao Liu, Benjamin Raphael
Optimal Transport (OT) is a powerful mathematical framework for comparing probability distributions by finding the least-cost mapping or coupling between them. It has garnered significant attention…
High-Dimensional Prediction for Sequential Decision Making — Georgy Noarov, Ramya Ramalingam, Aaron Roth, Stephan Xie
This talk introduces a novel framework for **high-dimensional prediction in sequential decision-making**, particularly in online adversarial environments. Presented by Ramya Ramalingam, a PhD…
How Do Large Language Monkeys Get Their Power (Laws)? — Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo
This talk, presented by Rylan Schaeffer and Joshua Kazdan at ICML 2025, delves into the fascinating and seemingly paradoxical scaling laws observed when using **Large Language Models (LLMs)**…
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks — Saurabh Jha, Rohan Arora, Yuji Watanabe, Takumi Yanagawa, Yinfang Chen, Jackson Clark, Bhavya Bhavya, Mudit Verma, Harshit Kumar, Hirokuni Kitahara, Noah Zheutlin, Saki Takano, Divya Pathak, Felix George, Xinbo Wu, Bekir Turkkan, Gerard Vanloo, Michael Nidd, Ting Dai, Oishik Chatterjee, Pranjal Gupta, Suranjana Samanta, Pooja Aggarwal, Rong Lee, Jae-wook Ahn, Debanjana Kar, Amit Paradkar, Yu Deng, Pratibha Moogi, Prateeti Mohapatra, Naoki Abe, Chandrasekhar Narayanaswami, Tianyin Xu, Lav Varshney, Ruchi Mahindru, Anca Sailer, Laura Shwartz, Daby Sow, Nicholas Fuller, Ruchir Puri
In the rapidly evolving landscape of artificial intelligence, the promise of **AI agents** to automate complex, real-world tasks has garnered immense attention. However, the true capabilities of…
Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent — Santhosh Karnik, Anna Veselovska, Mark Iwen, Felix Krahmer
This talk, presented at ICML 2025 by a collaborative team including Santhosh Karnik, Anna Veselovska, Mark Iwen, and Felix Krahmer, delves into one of the most fundamental and persistent theoretical…
Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance — Shogo Iwazaki, Shion Takeno
This talk, presented by Shogo Iwazaki and Shion Takeno at ICML 2025, delves into the theoretical underpinnings of **Gaussian Process (GP) bandits**, a fundamental problem in sequential…
Improving the Scaling Laws of Synthetic Data with Deliberate Practice — Reyhane Askari Hemmat, Mohammad Pezeshki, Elvis Dohmatob, Florian Bordes, Pietro Astolfi, Melissa Hall, Jakob Verbeek, Michal Drozdzal, Adriana Romero-Soriano
In the rapidly evolving landscape of machine learning, the quest for ever more performant models often hinges on the availability of vast, high-quality datasets. However, acquiring and labeling…
In-Context Denoising with One-Layer Transformers: Connections between Attention and Associative Memory Retrieval — Matthew Smart, Alberto Bietti, Anirvan Sengupta
This talk, presented by Matthew Smart and his colleagues at the Flatiron Institute, delves into a novel "in-context denoising" task designed to bridge the theoretical gap between two seemingly…
Inductive Moment Matching — Linqi (Alex) Zhou, Stefano Ermon, Jiaming Song
In the rapidly evolving landscape of generative AI, particularly in visual domains, **diffusion models** and **flow matching** have emerged as dominant paradigms, powering sophisticated…
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models — Parshin Shojaee, Ngoc Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa Doan, Chandan Reddy
This talk, presented by Chandan Reddy from Virginia Tech on behalf of his PhD students Parshin Shojaee and Ngoc Hieu Nguyen and co-authors, introduces **LLM-SRBench**, a novel benchmark designed to…
Layer by Layer: Uncovering Hidden Representations in Language Models — Oscar Skean, Md Rifat Arefin, Dan Zhao, Niket Patel, Jalal Naghiyev, Yann LeCun, Ravid Shwartz-Ziv
In the rapidly evolving landscape of large language models (LLMs), a set of deeply ingrained assumptions often guides their application: that the final layers yield the most optimal embeddings for…
Learning Dynamics in Continual Pre-Training for Large Language Models — Xingjin Wang, Howe Tissue, Lu Wang, Linjing Li, Daniel Zeng
This talk, presented by Xingjin Wang from the University of Chinese Academy of Sciences, delves into the intricate **learning dynamics** of **continual pre-training (CPT)** for **Large Language…
Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction — Xiang Fu, Brandon Wood, Luis Barroso-Luque, Daniel S. Levine, Meng Gao, Misko Dzamba, Larry Zitnick
This article delves into a pivotal talk presented at ICML 2025 by Brandon Wood and Meng Gao from Meta's Fair Chemistry team, highlighting groundbreaking work led by Xiang Fu. The presentation…
Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes — Weihan Li, Yule Wang, Chengrui Li, Anqi Wu
In this insightful talk from ICML 2025, Weihan Li and collaborators from Georgia Tech presented a novel approach to decipher the complex, dynamic communication patterns within the brain. The work…
Learning dynamics in linear recurrent neural networks — Alexandra Proca, Clémentine Dominé, Murray Shanahan, Pedro Mediano
This talk, presented by Alexandra Proca at ICML 2025, delves into the intricate mechanisms of learning within **linear recurrent neural networks (RNNs)**. The research addresses a critical gap in…
Learning with Expected Signatures: Theory and Applications — Lorenzo Lucchese, Mikko S. Pakkanen, Almut E. D. Veraart
This article delves into the theoretical advancements and practical implications presented in the ICML 2025 talk "Learning with Expected Signatures: Theory and Applications." Delivered by Lorenzo…
LoRA Training Provably Converges to a Low-Rank Global Minimum Or It Fails Loudly (But it Probably Won't Fail) — Junsu Kim, Jaeyeon Kim, Ernest Ryu
Low-Rank Adaptation (LoRA) has emerged as a cornerstone technique for parameter-efficient fine-tuning of large pre-trained models. By introducing low-rank updates to specific layers, LoRA…
LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently — Yuanhe Zhang, Fanghui Liu, Yudong Chen
This talk, presented by Fanghui Liu at ICML 2025, introduces **LoRA-One**, a novel approach to fine-tuning **Large Language Models (LLMs)** that redefines the trade-off between performance and…
Long-Form Speech Generation with Spoken Language Models — Se Jin Park, Julian Salazar, Aren Jansen, Keisuke Kinoshita, Yong Man Ro, RJ Skerry-Ryan
This article delves into the groundbreaking work presented by Se Jin Park and Julian Salazar on **SpeechSSM**, a novel approach to generating long-form, coherent, and expressive speech using spoken…
MGD$^3$ : Mode-Guided Dataset Distillation using Diffusion Models — Jeffrey A. Chan-Santiago, praveen tirupattur, Gaurav Kumar Nayak, Gaowen Liu, Mubarak Shah
This article delves into MGD$^3$, a novel approach to **Dataset Distillation (DD)** that leverages the power of **diffusion models** with a unique **mode-guided sampling strategy**. Presented by…
Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics — Herman Chau, Helen Jenne, Davis Brown, Jesse He, Mark Raugas, Sara Billey, Henry Kvinge
This talk, presented by Herman Chau, introduces a novel collection of datasets designed to advance the application of AI in pure mathematics, specifically focusing on the often-overlooked aspects of…
Mixture of Lookup Experts — Shibo Jie, Yehui Tang, Kai Han, Yitong Li, Duyu Tang, Zhi-Hong Deng, Yunhe Wang
This presentation introduces **Mixture of Lookup Experts (MoLA)**, a novel architectural design aimed at making large language models (LLMs) more friendly for deployment on edge devices such as…
Model Immunization from a Condition Number Perspective — Amber Yijia Zheng, Cedar Site Bai, Brian Bullins, Raymond A. Yeh
In an era defined by the rapid advancement and widespread accessibility of generative AI, the responsible deployment of powerful open-weight models like **Stable Diffusion** and **DeepFloyd**…
Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning — Fangwen Wu, Lechao Cheng, Shengeng Tang, Xiaofeng Zhu, Chaowei Fang, Dingwen Zhang, Meng Wang
The field of continual or incremental learning stands as a critical frontier in artificial intelligence, aiming to build models that can acquire new knowledge sequentially without forgetting…
Near-Optimal Decision Trees in a SPLIT Second — Varun Babbar, Hayden McTavish, Cynthia Rudin, Margo Seltzer
This article delves into the groundbreaking work presented by Varun Babbar and Hayden McTavish at ICML 2025, detailing their paper "Near-Optimal Decision Trees in a SPLIT Second." The talk…
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning — Guozheng Ma, Lu Li, Zilin Wang, Li Shen, Pierre-Luc Bacon, Dacheng Tao
This talk, presented by Lu Ma from Mila and the University of Montreal, introduces a groundbreaking approach to scaling Deep Reinforcement Learning (Deep RL) models through the strategic application…
Neural Discovery in Mathematics: Do Machines Dream of Colored Planes? — Konrad Mundinger, Max Zimmer, Aldo Kiem, Christoph Spiegel, Sebastian Pokutta
This talk, presented by researchers from the IOL Research Lab in Berlin, delves into the fascinating intersection of machine learning and pure mathematics, presenting a compelling case study of…
Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness — Konstantinos Oikonomidis, Jan Quan, Emanuel Laude, Panagiotis Patrinos
This talk, presented by Konstantinos Oikonomidis and his co-authors Jan Quan, Emanuel Laude, and Panagiotis Patrinos, introduces a novel framework for **nonlinearly preconditioned gradient methods**…
Normalizing Flows are Capable Generative Models — Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran, David Berthelot, Jiatao Gu, Huangjie Zheng, Tianrong Chen, Miguel Angel Bautista Martin, Navdeep Jaitly, Joshua M Susskind
This talk, presented by Shuangfei Zhai from Apple's machine learning research team, challenges the long-held notion that **Normalizing Flows (NFs)** are inferior generative models compared to more…
On Differential Privacy for Adaptively Solving Search Problems via Sketching — Shiyuan Feng, Ying Feng, George Li, Zhao Song, David Woodruff, Lichen Zhang
This talk, presented by Zhao Song from UC Berkeley, introduces a novel framework for achieving **differential privacy (DP)** in the context of **adaptively solving search problems** through the…
On Path to Multimodal Generalist: General-Level and General-Bench — Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Weiming Wu, Tianjie Ju, Zixiang Meng, Shilin Xu, Liyu Jia, Wentao Hu, Meng Luo, Jiebo Luo, Tat-Seng Chua, Shuicheng YAN, Hanwang Zhang
This talk, presented by Tianjie Ju and a large team of co-authors from numerous institutions, introduces a novel framework for evaluating general-purpose multimodal foundation models. Titled "On…
One-Step Generalization Ratio Guided Optimization for Domain Generalization — Sumin Cho, Dongwon Kim, Kwangsu Kim
In the rapidly evolving landscape of machine learning, the ability of models to generalize effectively to unseen data distributions remains a paramount challenge. This talk, presented by Sumin Cho…
Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection — Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, Li Yuan
The proliferation of sophisticated AI models capable of generating highly realistic images has introduced a critical challenge: the reliable detection of **AI-generated images (AIGI)**. This talk…
Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models — Anshuman Chhabra, Bo Li, Jian Chen, Prasant Mohapatra, Hongfu Liu
In the rapidly evolving landscape of deep learning, the quality and relevance of training data are paramount. Models, especially large language models (LLMs), often learn undesirable behaviors or…
Partition First, Embed Later: Laplacian-Based Feature Partitioning for Refined Embedding and Visualization of High-Dimensional Data — Erez Peterfreund, Ofir Lindenbaum, Yuval Kluger, Boris Landa
In this compelling talk from ICML 2025, Erez Peterfreund, a postdoc at Yale University, presented novel research on **Laplacian-based feature partitioning** for enhancing the embedding and…
Polynomial-Delay MAG Listing with Novel Locally Complete Orientation Rules — Tian-Zuo Wang, Wen-Bo Du, Zhi-Hua Zhou
This talk, presented by Tian-Zuo Wang from Nanjing University, delves into a critical challenge within **causal inference**: efficiently identifying all possible causal structures consistent with…
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation — D. Sculley, William Cukierski, Phil Culliton, Sohier Dane, Maggie Demkin, Ryan Holbrook, Addison Howard, Paul Mooney, Walter Reade, Meg Risdal, Nate Keating
In this thought-provoking position talk at ICML 2025, Walter Reade, a data scientist on the Kaggle competitions team, alongside a distinguished group of colleagues, presented a compelling argument…
Position: AI Safety should prioritize the Future of Work — Sanchaita Hazra, Bodhisattwa Prasad Majumder, Tuhin Chakrabarty
This talk, presented by Tuhin Chakrabarty on behalf of himself and co-authors Sanchaita Hazra and Bodhisattwa Prasad Majumder, delivers a compelling position statement arguing for a fundamental…
Position: Certified Robustness Does Not (Yet) Imply Model Security — Andrew C. Cullen, Paul MONTAGUE, Sarah Erfani, Benjamin Rubinstein
In this thought-provoking position paper presented at ICML 2025, Dr. Andrew Cullen, alongside collaborators Paul Montague, Sarah Erfani, and Benjamin Rubinstein from the University of Melbourne and…
Position: Current Model Licensing Practices are Dragging Us into a Quagmire of Legal Noncompliance — Moming Duan, Mingzhe Du, Rui Zhao, Mengying Wang, Yinghui Wu, Nigel Shadbolt, Bingsheng He
In an era increasingly defined by the rapid proliferation and pervasive reuse of **foundation models**, the legal landscape governing their deployment and derivative works is becoming alarmingly…
Position: Generative AI Regulation Can Learn from Social Media Regulation — Ruth Elisabeth Appel
In this thought-provoking talk at ICML 2025, Ruth Elisabeth Appel, a postdoctoral fellow at Stanford University at the time of the research and now with Anthropic, presented a compelling argument…
Position: Medical Large Language Model Benchmarks Should Prioritize Construct Validity — Ahmed Alaa, Thomas Hartvigsen, Niloufar Golchini, Shiladitya Dutta, Frances Dean, Inioluwa Raji, Travis Zack
In a critical presentation at ICML 2025, Tom Hartvigsen, faculty at the University of Virginia, delivered a compelling position paper on behalf of a large collaborative team from Berkeley and UCSF…
Position: Not All Explanations for Deep Learning Phenomena Are Equally Valuable — Alan Jeffares, Mihaela van der Schaar
In this thought-provoking position paper presented at ICML 2025, Alan Jeffares, a PhD student with over three and a half years of experience researching deep learning phenomena such as **double…
Position: Political Neutrality in AI Is Impossible — But Here Is How to Approximate It — Jillian Fisher, Ruth Elisabeth Appel, Chan Young Park, Yujin Potter, Liwei Jiang, Taylor Sorensen, Shangbin Feng, Yulia Tsvetkov, Margaret Roberts, Jennifer Pan, Dawn Song, Yejin Choi
In an era where artificial intelligence increasingly influences decision-making across various domains, the notion of **political neutrality** in AI has emerged as a critical, yet often elusive…
Position: Principles of Animal Cognition to Improve LLM Evaluations — Sunayana Rane, Cyrus Kirkman, Graham Todd, Amanda Royka, Ryan Law, Erica Cartmill, Jacob Foster
In an era where large language models (LLMs) exhibit increasingly sophisticated and emergent behaviors, evaluating their true cognitive capabilities remains a profound challenge. This insightful…
Position: Probabilistic Modelling is Sufficient for Causal Inference — Bruno Mlodozeniec, David Krueger, Richard E Turner
This talk, presented by Bruno Mlodozeniec and co-authored with David Krueger and Richard E Turner, challenges a foundational debate in machine learning and statistics: whether specialized "causal…
Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards — Jaeho Kim, Yunseok Lee, Seulki Lee
In this insightful and timely talk at ICML 2025, Jaeho Kim, along with Yunseok Lee and Seulki Lee from UNIST South Korea, presented a compelling position paper addressing the escalating crisis in AI…
Prices, Bids, Values: One ML-Powered Combinatorial Auction to Rule Them All — Ermis Soumalias, Jakob Heiss, Jakob Weissteiner, Sven Seuken
This talk introduces a groundbreaking hybrid auction mechanism that masterfully integrates two distinct types of queries—**demand queries** and **value queries**—powered by machine learning…
ReferSplat: Referring Segmentation in 3D Gaussian Splatting — Shuting He, Guangquan Jie, Changshuo Wang, Yun Zhou, Shuming Hu, Guanbin Li, Henghui Ding
The talk "ReferSplat: Referring Segmentation in 3D Gaussian Splatting" introduces a groundbreaking framework for enabling natural language interaction with 3D scenes, specifically within the context…
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction — Vaishnavh Nagarajan, Chen Wu, Charles Ding, Aditi Raghunathan
This article delves into the ICML 2025 outstanding paper, "Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction," presented by Vaishnavh Nagarajan and Chen…
Rényi Neural Processes — Xuesong Wang, He Zhao, Edwin V. Bonilla
Neural Processes (NPs) represent a powerful paradigm in machine learning, offering a flexible framework for context-based prediction and robust uncertainty estimation. At ICML 2025, Xuesong Wang…
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs — Xin Su, Man Luo, Kris Pan, Tien Pei Chou, Vasudev Lal, Phillip Howard
In the rapidly evolving landscape of artificial intelligence, **multimodal Large Language Models (LLMs)** have demonstrated remarkable capabilities in understanding and generating content across…
STAIR: Improving Safety Alignment with Introspective Reasoning — Yichi Zhang, Siyuan Zhang, Yao Huang, Zeyu Xia, Zhengwei Fang, Xiao Yang, Ranjie Duan, Dong Yan, Yinpeng Dong, Jun Zhu
In an era where large language models (LLMs) are rapidly integrating into critical applications—from medical advice to policy drafting—concerns regarding their safety and trustworthiness have…
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? — Samuel Miserendino, Michele Wang, Tejal Patwardhan, Johannes Heidecke
In an era where large language models (LLMs) are increasingly demonstrating advanced reasoning and code generation capabilities, a critical question emerges: how well do these models perform on…
Sanity Checking Causal Representation Learning on a Simple Real-World System — Juan L. Gamella, Simon Bing, Jakob Runge
This talk, presented by Simon Bing and Juan L. Gamella, delves into a fundamental yet often overlooked question within the field of **Causal Representation Learning (CRL)**: Does it actually work on…
Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks — Shikai Qiu, Lechao Xiao, Andrew Wilson, Jeffrey Pennington, Atish Agarwala
This article delves into a groundbreaking discovery presented at ICML 2025 by Shikai Qiu and collaborators from Google DeepMind and New York University, titled "Scaling Collapse Reveals Universal…
Score Matching with Missing Data — Josh Givens, Song Liu, Henry Reeve
This talk, presented by Song Liu and spearheaded by his PhD student Josh Givens, introduces a novel framework for applying **score matching** techniques to datasets afflicted by missing values…
Statistical Collusion by Collectives on Learning Platforms — Etienne Gauthier, Francis Bach, Michael Jordan
This article delves into the groundbreaking work presented by Etienne Gauthier, Francis Bach, and Michael Jordan at ICML 2025, titled "Statistical Collusion by Collectives on Learning Platforms."…
Statistical Query Hardness of Multiclass Linear Classification with Random Classification Noise — Ilias Diakonikolas, Mingchen Ma, Lisheng Ren, Christos Tzamos
This talk delves into the computational complexity of **multiclass linear classification (MLC)**, a fundamental problem in machine learning, particularly when faced with **random classification…
Statistical Test for Feature Selection Pipelines by Selective Inference — Tomohiro Shiraishi, Tatsuya Matsukawa, Shuichi Nishino, Ichiro Takeuchi
In the rapidly evolving landscape of AI-driven scientific discovery, the ability to identify truly meaningful patterns amidst vast datasets is paramount. This talk, presented by Ichiro Takeuchi from…
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning — Aaditya Singh, Ted Moskovitz, Sara Dragutinović, Feilx Hill, Stephanie Chan, Andrew Saxe
The talk "Strategy Coopetition Explains the Emergence and Transience of In-Context Learning" by Aaditya Singh and collaborators delves into one of the most intriguing and foundational phenomena in…
Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings — Angéline Pouget, Mohammad Yaghini, Stephan Rabanser, Nicolas Papernot
In the rapidly evolving landscape of machine learning, deploying models into production environments presents a unique set of challenges, particularly when the operational data distribution differs…
Sundial: A Family of Highly Capable Time Series Foundation Models — Yong Liu, Guo Qin, Zhiyuan Shi, Zhi Chen, Caiyin Yang, Xiangdong Huang, Jianmin Wang, Mingsheng Long
Yong Liu from Tsinghua University presented **Sundial**, a novel family of **time series foundation models** designed to overcome long-standing challenges in time series forecasting. The talk delves…
Temporal Difference Flows — Jesse Farebrother, Matteo Pirotta, Andrea Tirinzoni, REMI MUNOS, Alessandro Lazaric, Ahmed Touati
This article delves into "Temporal Difference Flows," a groundbreaking work presented at ICML 2025 by Jesse Farebrother, who conducted this research during an internship at Meta, alongside a team of…
The Value of Prediction in Identifying the Worst-Off — Unai Fischer Abaigar, Christoph Kern, Juan Perdomo
This article delves into a crucial dilemma faced by public agencies: how to effectively allocate scarce resources to individuals most in need, especially when leveraging machine learning (ML) for…
The dark side of the forces: assessing non-conservative force models for atomistic machine learning — Filippo Bigi, Marcel Langer, Michele Ceriotti
This talk delves into the critical challenges and innovative solutions for integrating machine learning models into classical mechanics simulations, particularly in the realm of atomistic molecular…
Theoretical Limitations of Ensembles in the Age of Overparameterization — Niclas Dern, John Cunningham, Geoff Pleiss
This talk, presented by Geoff Pleiss, explores the theoretical underpinnings of **deep ensembles** in the contemporary landscape of **overparameterized models**. The research, primarily driven by…
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions — Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham Kakade, Sitan Chen
This talk, presented by Kulin Shah at ICML 2025, delves into the fundamental mechanisms and challenges of **Masked Diffusion Models (MDMs)**, particularly concerning their approach to token ordering…
Training a Generally Curious Agent — Fahim Tajwar, Yiding Jiang, Abitha Thankaraj, Sumaita Rahman, Zico Kolter, Jeff Schneider, Russ Salakhutdinov
In this compelling talk from ICML 2025, Yiding Jiang, a PhD student at Carnegie Mellon University, along with collaborators Fahim Tajwar, Abitha Thankaraj, Sumaita Rahman, Zico Kolter, Jeff…
Transformative or Conservative? Conservation laws for ResNets and Transformers — Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré
This detailed technical article explores the groundbreaking work presented at ICML 2025 by Sibylle Marcotte, Rémi Gribonval, and Gabriel Peyré on **conserved quantities** during the training…
VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data — Thomas Zeng, Shuibai Zhang, Shutong Wu, Christian Classen, Daewon Chae, Ethan Ewer, Minjae Lee, Heeju Kim, Wonjun Kang, Jackson Kunde, Ying Fan, Jungtaek Kim, HYUNG IL KOO, Kannan Ramchandran, Dimitris Papailiopoulos, Kangwook Lee
In the rapidly evolving landscape of artificial intelligence, **Large Language Models (LLMs)** have demonstrated remarkable capabilities in understanding and generating human-like text. However…
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models — Hila Chefer, Uriel Singer, Amit Zohar, Yuval Kirstain, Adam Polyak, Yaniv Taigman, Lior Wolf, Shelly Sheynin
In the rapidly evolving landscape of video generation, significant strides have been made in rendering visually stunning and high-fidelity content. However, as Hila Chefer from Meta AI and Tel Aviv…
VideoRoPE: What Makes for Good Video Rotary Position Embedding? — Xilin Wei, Xiaoran Liu, Yuhang Zang, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Jian Tong, Haodong Duan, Qipeng Guo, Jiaqi Wang, Xipeng Qiu, Dahua Lin
The proliferation of video data and the increasing demand for sophisticated long-context video understanding tasks present significant challenges for current deep learning models, particularly in…
What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities — Wendong Bu, Yang Wu, Qifan Yu, Minghe Gao, Bingchen Miao, Zhenkui Zhang, Kaihang Pan, liyunfei, Mengze Li, Wei Ji, Juncheng Li, Siliang Tang, Yueting Zhuang
This talk introduces **OmniBench**, a novel subtask-based benchmark designed to provide a scalable and multi-dimensional evaluation framework for virtual agents. Presented by Yaoxin Li from the…
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking — Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, Mao Yang
The talk introduces **rStar-Math**, a novel framework designed to empower small Language Models (LLMs) with advanced mathematical reasoning capabilities through a process of self-evolved deep…