I Know What You Asked: Prompt Leakage via KV-Cache Sharing in Multi-Tenant LLM Serving

Guanlong Wu

Network and Distributed System Security (NDSS) Symposium 2025 · Day 2 · LLM Privacy and Usable Privacy

This talk, presented by Guanlong Wu from Southern University of Science and Technology (SUST), uncovers a critical vulnerability in multi-tenant Large Language Model (LLM) serving systems: **prompt leakage via KV-cache sharing**. The research, a collaboration with graduate students from SUST and colleagues from Bytedance, identifies a novel side channel attack that exploits the memory optimization techniques commonly employed in LLM inference engines. Specifically, the attack targets the **Key-Value (KV) cache**, a component designed to store intermediate computations for efficiency, and the scheduling policies that govern its use across multiple users.

AI review

Solid, original systems security research that treats LLM serving infrastructure as what it actually is — a shared-resource OS scheduling problem with side-channel exposure. The attack primitive is clean, the threat model is honest about its constraints, and the finding already produced a real patch in SGLAN's scheduler.

Watch on YouTube