Chasing Shadows: Pitfalls in LLM Security Research

Jonathan Evertz

Network and Distributed System Security (NDSS) Symposium 2026 · Day 3 · AI & Web Security

This paper identifies **nine distinct pitfalls** that undermine the reproducibility, rigor, and soundness of security research using large language models. Analyzing **72 papers** across **eight top-tier venues** (security and software engineering), the researchers found that **every single paper contained at least one pitfall**, and only **16% of pitfalls were addressed** in any discussion. Five case studies demonstrate that these pitfalls can drastically affect results -- for example, different GPT-4 versions cause up to **12% accuracy variation**, and model quantization dramatically changes attack success rates for jailbreaks and prompt injections (2-bit models far more vulnerable than 8-bit).

AI review

A needed methodological critique of LLM security research that finds every paper in a 72-paper survey contains at least one reproducibility pitfall. The quantization finding (2-bit models dramatically more vulnerable to jailbreaks) and model version sensitivity (12% accuracy swings) are concrete results that should change how researchers evaluate LLM security. Not an attack or defense paper, but important infrastructure for the field.

Watch on YouTube