Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

Zhifan Luo

Network and Distributed System Security (NDSS) Symposium 2026 · Day 2 · Cache & Microarch Security

Large language model inference relies on a critical optimization called the **key-value (KV) cache**, which stores intermediate key and value matrices to avoid redundant computation during autoregressive generation. This talk reveals that the KV cache -- often gigabytes in size and typically processed, transmitted, and stored in **plaintext** due to performance constraints -- constitutes a dangerous privacy attack surface. The researchers from Georgia Tech demonstrate three distinct attack methods that can reconstruct user input prompts from leaked KV cache data, achieving alarming success rates across multiple model architectures.

AI review

A technically rigorous attack-and-defense paper that demonstrates three practical methods for reconstructing user prompts from KV cache data in LLM inference systems. The collision attack is universally effective across architectures, the chosen-plaintext variant achieves near-perfect recovery, and the proposed KV-Clock defense exploits a genuinely clever insight about positional encoding redundancy. This is real cryptanalysis-grade thinking applied to ML infrastructure.

Watch on YouTube