Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention
Lujia Shen
Network and Distributed System Security (NDSS) Symposium 2024 · Day 3 · LLM Security
Transformer-based large language models (LLMs) like BERT and GPT have revolutionized natural language processing (NLP), achieving unprecedented performance across a myriad of tasks, from text generation to classification. However, their widespread adoption is hampered by a critical vulnerability: their susceptibility to **textual adversarial attacks**. Maliciously crafted input perturbations, often imperceptible to humans, can easily mislead these models, leading to severe consequences such as the generation of harmful content, misclassification of toxic comments, or incorrect responses in critical applications. Existing defense mechanisms, including computationally intensive adversarial training and certified robust approaches that often degrade performance, have struggled to provide scalable and effective solutions, particularly for the enormous scale of modern foundation models.