Backdooring Multimodal Learning

Xingshuo Han, Yutong Wu, Qingjie Zhang, Yuan Zhou, Yuan Xu, Han Qiu

IEEE Symposium on Security and Privacy 2024 · Day 3 · Continental Ballroom 5

Multimodal learning, which integrates information from multiple data streams such as visual, audio, and textual inputs, has achieved impressive performance across a wide range of applications. From visual question answering and audio-video speech recognition to social media content classification, these models leverage heterogeneous data to enhance their predictive capabilities. However, as deep learning models, multimodal systems are not immune to sophisticated adversarial attacks, specifically **backdoor attacks**, which pose a significant threat to their integrity and trustworthiness.

AI review

This research introduces a novel framework for backdooring multimodal learning, proposing the BABS score for efficient sample selection and two new attack methods. It uncovers critical insights into modality interactions, demonstrating that learning dominance does not equate to backdoor vulnerability, and provides a crucial foundation for securing increasingly complex AI systems.

Watch on YouTube