Backdooring Multimodal Learning

Xingshuo Han, Yutong Wu, Qingjie Zhang, Yuan Zhou, Yuan Xu, Han Qiu

IEEE Symposium on Security and Privacy 2024 · Day 3 · Continental Ballroom 5

This talk, "Backdooring Multimodal Learning," presented by Xingshuo Han and colleagues from Nanjing Technological University Singapore and Tsinghua University China, delves into the novel and critical area of backdoor attacks against multimodal deep learning models. While deep learning models, including those handling multiple data types, have demonstrated impressive performance across various benchmarks and real-world applications like visual question answering (VQA) and audio-video speech recognition (AVSR), they are increasingly recognized as vulnerable to adversarial manipulation. This research specifically highlights how these complex multimodal systems possess unique vulnerabilities that are not adequately addressed by existing backdoor attack methodologies, which largely focus on unimodal tasks.

AI review

This work delivers critical, novel research into backdoor attacks on multimodal deep learning, introducing an efficient scoring mechanism and two potent attack strategies. The counter-intuitive findings on modality interaction and dominance shifts fundamentally advance our understanding of these complex systems' vulnerabilities. This is precisely the kind of deep, actionable insight the community needs.

Watch on YouTube