Training a Generally Curious Agent
Fahim Tajwar, Yiding Jiang, Abitha Thankaraj, Sumaita Rahman, Zico Kolter, Jeff Schneider, Russ Salakhutdinov
International Conference on Machine Learning 2025 · Oral
In this compelling talk from ICML 2025, Yiding Jiang, a PhD student at Carnegie Mellon University, along with collaborators Fahim Tajwar, Abitha Thankaraj, Sumaita Rahman, Zico Kolter, Jeff Schneider, and Russ Salakhutdinov, presented groundbreaking research on training a **generally curious agent** capable of solving novel problems it has not encountered during its training phase. The core challenge addressed is the pervasive issue of generalization in machine learning models, particularly for autonomous agents deployed in dynamic, real-world environments. Such agents frequently face situations where crucial information is missing or problems are vaguely defined, necessitating active exploration and information gathering at test time—a distinct and more challenging paradigm than traditional train-time exploration.
AI review
Paprika is a competently executed pipeline for training LLMs to exhibit general information-seeking behavior at test time, combining diverse task construction, curriculum learning via a bandit-selected 'learning potential' metric, and DPO on diversely sampled trajectories. The empirical results — 35% in-distribution gains on Llama-3-8B and 11% out-of-distribution improvement in a leave-one-out setup — are genuinely encouraging and the framing around amortized exploration is clean. However, the talk as described presents an applied pipeline with limited theoretical grounding: the central…