Memory Backdoor Attacks on Neural Networks

Eden Luzon

Network and Distributed System Security (NDSS) Symposium 2026 · Day 2 · Privacy & Measurement

Federated learning is widely assumed to guarantee data privacy because training data never leaves client devices. This talk dismantles that assumption by presenting **memory backdoor attacks** -- a technique where a compromised central server injects malicious training code that causes local models to memorize and systematically reconstruct specific training samples on demand. Unlike previous memorization attacks that were limited in capacity, fragile to noise, and lacked systematic extraction, memory backdoors allow an attacker to request specific images by index (e.g., "give me image number 325, class dog, green channel, sixth patch") and reconstruct them with high structural similarity.

AI review

A clever attack that turns neural networks into covert data exfiltration channels. By adding an auxiliary loss function to the training code, a compromised federated learning server can cause client models to memorize and reconstruct specific training samples on demand via index triggers. The 98% success rate at 1,000 images with zero accuracy impact makes this effectively undetectable, and the extension to LLMs via textual triggers shows generality. The piece-by-piece reconstruction through encoded triggers is technically elegant.

Watch on YouTube