From Purity to Peril: Backdooring Merged Models From "Harmless" Benign Components

Lijin Wang

34th USENIX Security Symposium (USENIX Security '25) · Day 3 · ML and AI Security 3: Backdoors, Poisoning, Unlearning

In an era defined by the escalating scale of artificial intelligence models, particularly **Large Language Models (LLMs)**, the traditional paradigm of training models from scratch has become prohibitively expensive in terms of both data and computational resources. This talk, presented by Lijin Wang, delves into a critical security vulnerability arising from an increasingly popular solution to this challenge: **model merging**. The research introduces a novel attack framework called **MergeBackdoor**, which demonstrates how an attacker can inject a backdoor into a final merged model by using upstream components that appear entirely benign and harmless when inspected individually.

AI review

Solid, original ML security research that breaks a genuinely dangerous assumption — that compositional safety follows from component safety. The attack is technically novel, the experimental validation is broad, and the threat model maps cleanly onto real supply chain risks as model merging goes mainstream.

Watch on YouTube