EmbedX: Embedding-Based Cross-Trigger Backdoor Attack Against Large Language Models

Nan Yan

34th USENIX Security Symposium (USENIX Security '25) · Day 1 · LLM Security and Attacks

The rapid advancements in large language models (LLMs) such as GPT-4, LLaMA, and GPT-2 have revolutionized numerous natural language processing (NLP) tasks, from machine translation to question answering and sentiment analysis. Despite their impressive capabilities, these sophisticated models remain highly susceptible to backdoor attacks, where specific "triggers" can manipulate the model into producing targeted, malicious outputs. This talk, "EmbedX: Embedding-Based Cross-Trigger Backdoor Attack Against Large Language Models," presented by Nan Yan, unveils a novel and highly potent attack vector that significantly challenges the security posture of contemporary LLMs.

AI review

EmbedX is genuine, novel research that advances the LLM backdoor attack space in a meaningful direction — moving triggers into continuous embedding space is the right instinct and the cross-lingual, sub-second trigger-switching result is the kind of finding that will make ML security teams uncomfortable. The defensive analysis is honest about current gaps rather than strawmanning weak baselines, which earns extra credit.

Watch on YouTube