CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
Rui Zeng
Network and Distributed System Security (NDSS) Symposium 2025 · Day 3 · ML Backdoors
In the evolving landscape of artificial intelligence security, **backdoor attacks** against Natural Language Processing (NLP) models pose a significant and increasingly sophisticated threat. This talk, presented by Rui Zeng from Jjan University at the NDSS Symposium, introduces **CLIBE**, a novel framework designed to detect **dynamic backdoors** in transformer-based NLP models. Unlike their static counterparts, dynamic backdoors manipulate model behavior using subtle, non-textual features like style or syntax, making them exceptionally stealthy and difficult to identify through conventional methods.
AI review
Legitimate academic security research with a clean technical contribution: parameter-space probing as a backdoor detection primitive for NLP models, with a novel theoretical grounding and F1 > 0.95 against dynamic triggers that demolish existing baselines. Not a DEF CON crowd-pleaser, but this is exactly the kind of rigorous ML security work NDSS should be running.