Jailbreaking LLMs and Agentic Systems: Attacks, Defenses, and Evaluations: Jailbreaking LLMs and Agentic Systems: Attacks, Defenses, and Evaluations

Hamed Hassani, Amin Karbasi, Alexander Robey

International Conference on Machine Learning 2025 · Tutorial

This comprehensive tutorial, presented by Hamed Hassani, Amin Karbasi, and Alexander Robey at ICML 2025, delves into the critical and rapidly evolving landscape of **jailbreaking attacks** against Large Language Models (LLMs) and emerging **agentic AI systems**. The speakers meticulously trace the history of this vulnerability, from its origins in consumer electronics to its current manifestation as a significant security concern for frontier AI models. The core objective is to equip attendees with a deep understanding of how these powerful models can be manipulated to bypass their inherent safety guardrails, the sophisticated methods being developed to counter such attacks, and the profound, often speculative, security implications of deploying increasingly autonomous AI agents in the real world.

AI review

A competently organized tutorial on LLM jailbreaking that surveys the attack-defense landscape with reasonable breadth, but offers no new theoretical framework, no new results, and no organizing principle that transcends the individual papers it summarizes. The talk is a well-curated literature review, not a research contribution. For a practitioner who wants orientation in the subfield, it may be useful. For anyone who has read even a handful of the source papers, there is nothing here that wasn't already known.