Understanding Data Importance in Machine Learning Attacks: Does Valuable Data Pose Greater Harm?

Rui Wen

Network and Distributed System Security (NDSS) Symposium 2025 · Day 2 · ML Security

In an era increasingly defined by Artificial Intelligence, the foundational role of data in driving Machine Learning (ML) innovation cannot be overstated. From large language models like ChatGPT to code generation tools like Copilot, high-quality data acts as the indispensable fuel for these sophisticated systems. However, not all data contributes equally to a model's performance or utility. Some data points are "VIPs," profoundly influencing model behavior, while others are less impactful. This talk by Rui Wen delves into a critical, yet often overlooked, aspect of ML security: the relationship between a data sample's importance and its vulnerability to various attacks, particularly **Membership Inference Attacks (MIA)**.

AI review

Solid academic ML security research with a clean central thesis: data importance (Shapley value) correlates with MIA vulnerability, and that correlation can be weaponized. The work is methodologically sound and the active manipulation angle is the most interesting piece, but this is primarily an incremental contribution to a well-traveled space rather than a paradigm shift.

Watch on YouTube