Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance

Shogo Iwazaki, Shion Takeno

International Conference on Machine Learning 2025 · Oral

This talk, presented by Shogo Iwazaki and Shion Takeno at ICML 2025, delves into the theoretical underpinnings of **Gaussian Process (GP) bandits**, a fundamental problem in sequential decision-making under uncertainty. The core focus is on refining the **regret analysis** of these algorithms, particularly those based on the **Maximum Variance Reduction (MVR)** procedure. GP bandits are widely employed across various critical machine learning applications, including **hyperparameter tuning**, **experimental design**, and **robotics**, where a smooth but unknown reward function must be optimized with minimal queries.

AI review

Iwazaki and Takeno present an improved MPSD upper bound for MVR-based GP bandit algorithms, resolving three open questions: near-optimality in the noiseless setting, better RKHS-norm dependence in regret, and non-stationary variance. The technical core is a refined information gain argument that sharpens the noise-variance dependence in the MPSD bound, with downstream consequences for Phased Elimination. This is honest, competent theoretical work that closes real gaps in the GP bandit literature. It does not, however, introduce a new conceptual framework or fundamentally reframe the problem…