Controlling Underestimation Bias in Constrained Reinforcement Learning for Safe Exploration

Shiqing Gao, Jiaxin Ding, Luoyi Fu, Xinbing Wang

International Conference on Machine Learning 2025 · Oral

This talk introduces a critical challenge in **Constrained Reinforcement Learning (CRL)**: the pervasive issue of cost underestimation bias, which leads to unsafe exploration in safety-critical applications. Presented on behalf of lead author Shiqing Gao from Shanghai Jiao Tong University, the work proposes a novel method called **Memory-driven Intrinsic Cost Estimation (MICE)**. MICE is designed to accurately estimate costs, thereby mitigating safety violations without compromising policy performance.

AI review

MICE proposes augmenting cost estimation in constrained RL with a memory-based intrinsic cost term to counter underestimation bias, a real and underexplored problem. The motivating diagnosis is plausible and the biological analogy is colorful, but the article — which reads as a promotional summary rather than a technical report — provides insufficient evidence that the formal claims are non-trivial, precisely stated, or actually proven. The 'tighter upper bound' result and 'convergence guarantee' are described in language that could cover anything from a genuine theorem to a corollary of a…