From Large to Mammoth: A Comparative Evaluation of Large Language Models in Vulnerability Detection

Jie Lin

Network and Distributed System Security (NDSS) Symposium 2025 · Day 3 · Vulnerability Detection

This article delves into a comprehensive study presented at the NDSS Symposium, titled "From Large to Mammoth: A Comparative Evaluation of Large Language Models in Vulnerability Detection." Presented by Jie Lin from the University of Central Florida, the research explores the burgeoning potential of Large Language Models (LLMs) in identifying security vulnerabilities within source code. The talk addresses a critical gap in the understanding of how various architectural and operational factors—such as model size, context window capacity, and quantization techniques—influence an LLM's accuracy and reliability in this specialized domain.

AI review

Competent empirical benchmarking of LLMs against vulnerability detection tasks with some genuinely useful counterintuitive findings — bigger isn't better, few-shot can collapse to zero, open-source can beat GPT-4. The methodology is clean and the research questions are well-scoped, but the dataset is tiny (280 Java files, 200 C/C++ files), the few-shot implementation is admittedly naive, and the conclusions mostly confirm what the security ML community already suspected. Fills a slot, won't define the conversation.

Watch on YouTube