Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention

Lujia Shen

Network and Distributed System Security (NDSS) Symposium 2024 · Day 3 · LLM Security

Transformer-based large language models (LLMs) like BERT and GPT have revolutionized natural language processing (NLP), achieving unprecedented performance across a myriad of tasks, from text generation to classification. However, their widespread adoption is hampered by a critical vulnerability: their susceptibility to **textual adversarial attacks**. Maliciously crafted input perturbations, often imperceptible to humans, can easily mislead these models, leading to severe consequences such as the generation of harmful content, misclassification of toxic comments, or incorrect responses in critical applications. Existing defense mechanisms, including computationally intensive adversarial training and certified robust approaches that often degrade performance, have struggled to provide scalable and effective solutions, particularly for the enormous scale of modern foundation models.