SoK: Explainable Machine Learning in Adversarial Environments

Maximilian Noppel, Christian Wressnegger

IEEE Symposium on Security and Privacy 2024 · Day 2 · Continental Ballroom 5

In an era where machine learning (ML) models are increasingly deployed in critical applications, the demand for transparency and accountability has led to the rise of **Explainable Artificial Intelligence (XAI)**. XAI methods aim to provide insights into *why* a model makes a particular decision, thereby fostering trust, enabling auditing, and facilitating debugging. However, as Maximilian Noppel and Christian Wressnegger highlight in their IEEE S&P talk, the very explanations designed to enhance model trustworthiness can themselves become targets for adversaries. Their Systematization of Knowledge (SoK) paper, "Explainable Machine Learning in Adversarial Environments," provides a comprehensive framework for understanding and classifying the burgeoning field of attacks against explainable systems.

AI review

Noppel and Wressnegger deliver a critical SoK, meticulously mapping the adversarial landscape of Explainable AI. Their framework for classifying explanation-aware attacks and formalizing robustness notions is foundational, exposing how explanations themselves become targets. This work is essential for anyone deploying or researching XAI in real-world, hostile environments.

Watch on YouTube