Was My Data Used for Training? Membership Inference in Open-Source LLMs via Neural Activations
Xue Tan
Network and Distributed System Security (NDSS) Symposium 2026 · Day 2 · AI Security
As open-source LLMs proliferate with massive, opaque training datasets, verifying whether specific data was used for training has become critical for privacy evaluation, compliance auditing, and copyright protection. This talk presents **NOT (Neural activation-based mOdel Training membership inference)**, a white-box membership inference framework that uses neural activations -- the model's internal layer-wise responses to inputs -- to determine training data membership with approximately **95% AUC** across multiple mainstream models.
AI review
A membership inference attack using neural activations and Siamese networks that achieves 95% AUC on fine-tuned LLMs. The technical execution is competent but the approach has significant limitations: it only achieves strong results on fine-tuned models (not pre-trained), cross-domain performance drops notably, and the experimental setup using post-cutoff data with fine-tuning to simulate membership is a weaker evaluation model than testing against actual pre-training data. Not particularly useful from an offensive perspective.