MMBD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic

Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

IEEE Symposium on Security and Privacy 2024 · Day 2 · Continental Ballroom 5

In an era increasingly reliant on machine learning models across critical infrastructure and everyday applications, the integrity and security of these models are paramount. This talk introduces **MMBD (Maximum Margin Backdoor Detection)**, a novel post-training detection framework designed to identify backdoor attacks in deep learning models, irrespective of the backdoor trigger's nature. Presented by Jin Wang, this research, conducted during his PhD at Pennsylvania State University, addresses the formidable challenge of defending against sophisticated backdoor attacks in real-world scenarios where defenders often lack crucial information about the attack.

AI review

This research introduces MMBD, a novel, trigger-agnostic framework for post-training backdoor detection in deep learning models, critically operating without benign samples or prior trigger knowledge. It uses a maximum margin statistic and statistical hypothesis testing for robust detection, complemented by MMBDM for non-invasive mitigation. The work addresses a critical real-world problem for model integrity, validated by strong empirical results and a second-place finish in the NeurIPS 2022 Trojan Removal Competition.

Watch on YouTube