In-Context Probing for Membership Inference in Fine-Tuned Language Models

Zhexi Lu

Network and Distributed System Security (NDSS) Symposium 2026 · Day 1 · AI Security

This talk presents a novel **membership inference attack (MIA)** against fine-tuned language models that exploits a fundamental property of training dynamics called the **optimization gap**. The key insight is that member samples (data used in training) show diminishing returns when further optimized, while non-member samples still have significant room for improvement. To detect this gap without requiring fine-tuning access, the researchers use **in-context learning as an approximation of gradient-based optimization**, achieving state-of-the-art results in a purely **black-box, reference-free setting**.

AI review

An elegant membership inference attack grounded in actual training dynamics rather than heuristic signal extraction. The insight that in-context learning approximates fine-tuning well enough to detect the optimization gap is theoretically principled and practically devastating -- 0.942 AUC in a purely black-box, reference-free setting. This makes privacy attacks against fine-tuned LLMs significantly cheaper and more accessible.

Watch on YouTube