Calibration and Bias in Algorithms, Data, and Models: a tutorial on metrics and plots for measuring calibration, bias, fairness, reliability, and robustness: Calibration and Bias in Algorithms, Data, and Models: a tutorial on metrics and plots for measuring calibration, bias, fairness, reliability, and robustness

Mark Tygert

International Conference on Machine Learning 2025 · Tutorial

In this comprehensive tutorial at ICML 2025, Mark Tygert delivers a critical examination of widely used methods for measuring calibration, bias, fairness, reliability, and robustness in machine learning. Tygert, drawing from his extensive background in applied mathematics, statistics, and physics, argues that many standard practices in AI/ML, particularly those involving binning or bucketing, are fundamentally flawed and have been abandoned by other scientific fields decades ago. The talk aims to equip the audience with statistically rigorous alternative methods, primarily rooted in classical cumulative statistics, which offer superior accuracy and interpretability.

AI review

Tygert delivers a competent and earnest tutorial arguing that binning-based calibration metrics (ECE, reliability diagrams) are asymptotically inconsistent and should be replaced by cumulative statistics rooted in Kolmogorov-Smirnov and Kuiper-type tests. The core statistical critique is correct and the recommended alternatives are sound. The talk is well-situated relative to classical nonparametric statistics and the connection to Brownian motion is genuinely illuminating for an ML audience. However, as an ICML contribution this is primarily pedagogical — the mathematics is classical, the…