Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

Neel P. Bhatt, Yunhao Yang, Ufuk Topcu, Zhangyang Wang

Conference on Machine Learning and Systems 2025 · Day 2 · Session 4: Reliable and Scalable Systems

Multimodal Foundation Models (MFMs) are rapidly becoming indispensable tools for developing advanced autonomous systems, particularly in robotics, where they offer a natural and intuitive interface for complex perception and planning tasks. However, a significant bottleneck limiting their widespread and reliable deployment is their inherent uncertainty. This talk, presented by Neel P. Bhatt and his co-authors at MLSys 2025, addresses this critical challenge head-on by proposing a novel and comprehensive framework for quantifying and mitigating uncertainty within these powerful models.

AI review

Bhatt et al. present a genuinely interesting framework for disentangling perception and decision uncertainty in multimodal foundation models, combining conformal prediction with formal verification (LTL + FSMs) in a way that's conceptually clean. The core idea — that 'aggregate uncertainty score' is useless for diagnosis — is correct and the split they propose is defensible. But the article reads like a polished abstract, not a talk where someone actually built something. Missing: what VLM did they use, what conformal calibration set, how expensive is the FSM conversion step, and is any of…