DNN-GP: Diagnosing and Mitigating Model's Faults Using Latent Concepts

Shuo Wang, Hongsheng Hu, Jiamin Chang, Benjamin Zi Hao Zhao, Qi Alfred Chen, Minhui Xue

33rd USENIX Security Symposium · Day 1 · USENIX Security '24

In an era where machine learning models underpin critical applications from advanced image generation to autonomous systems, their inherent robustness remains a significant challenge. Adversarial attacks, where imperceptible noise can drastically alter a model's classification, and data corruption, such as blurring or shifting, expose fundamental vulnerabilities. The talk "DNN-GP: Diagnosing and Mitigating Model's Faults Using Latent Concepts," presented by Shuo Wang from CSIRO Australia on behalf of his co-authors, introduces a novel framework designed to interpret and understand these model failures at a high, conceptual level, moving beyond pixel-level visualizations.

AI review

This talk introduces DNN-GP, a novel framework that transcends pixel-level analysis to provide high-level conceptual diagnosis of machine learning model failures. Leveraging a VQ-VAE architecture, it offers a structured understanding of how adversarial attacks and data corruption manipulate latent concepts, providing critical insights for developing more robust AI systems. The ability to reconstruct adversarial examples to show *how* attacks work at a conceptual level is particularly compelling.

Watch on YouTube