When Cache Poisoning Meets LLM Systems: Semantic Cache Poisoning and Its Countermeasures

Guanlong Wu

Network and Distributed System Security (NDSS) Symposium 2026 · Day 2 · Cache & Microarch Security

As large language model (LLM) services face mounting pressure from high API costs and inference latency, **semantic caching** has emerged as a widely adopted optimization. The idea is simple: if a user asks a question semantically similar to one already answered, serve the cached response instead of recomputing it. Major cloud providers including **Azure**, **AWS**, and **Alibaba Cloud**, as well as open-source frameworks like **GPTCache**, have adopted this approach. This talk presents the first in-depth demonstration that semantic caches introduce a dangerous new attack surface -- **semantic cache poisoning** -- where an attacker acting as a regular end user can craft malicious queries that poison the cache and cause victim users to receive attacker-controlled responses.

AI review

A clean demonstration of semantic cache poisoning against production LLM services including Azure, AWS, and Alibaba Cloud, achieving 87-98% success rates. The attack model is realistic -- regular user, no elevated privileges -- and the black-box variant using the target query as prefix is elegant in its simplicity. The work establishes a genuine new attack surface, though the technique itself is straightforward prompt injection combined with cache abuse rather than deep technical novelty.

Watch on YouTube