Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs
Yinan Zhong
Network and Distributed System Security (NDSS) Symposium 2026 · Day 1 · AI Security
This talk presents **Renovate**, a framework for detecting and sanitizing **Indirect Prompt Injection (IPI)** attacks in LLM-integrated applications. IPI attacks occur when adversaries embed malicious instructions in external data sources (websites, databases, APIs) that are consumed by LLM agents, hijacking the model into following the attacker's instructions instead of the user's. Unlike prior defenses that either detect attacks (but may terminate service) or attempt prevention through prompt engineering (with limited effectiveness), Renovate performs **token-level detection and sanitization** -- identifying and removing individual injected tokens while preserving the integrity of legitimate data.
AI review
A well-designed defense framework against indirect prompt injection that uses attention features for token-level detection and sanitization, achieving 97-99% accuracy across five models. The two-step attentive pooling mechanism is technically elegant, and the approach of surgical token removal rather than wholesale input rejection is operationally sound. The unseen attack generalization is promising but needs adversarial robustness testing against adaptive attackers who specifically target the attention-based detection.