Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

Chongzhou Fang, Jialin Liu, Ruoyu Zhang, Han Wang, Houman Homayoun

33rd USENIX Security Symposium · Day 1 · USENIX Security '24

This talk, presented by Chongzhou Fang, a fourth-year PhD student at UC Davis, delves into a critical and timely evaluation of **Large Language Models (LLMs)** for **code analysis**. With the explosive growth of LLMs and their increasing application in software development, particularly for code generation, a thorough understanding of their capabilities in analyzing and explaining existing code — especially **obfuscated code** — has become paramount. The research addresses a significant gap in the literature, providing the first comprehensive assessment of how well state-of-the-art LLMs perform these tasks.

AI review

This talk delivers a much-needed, data-driven reality check on LLM capabilities for code analysis, especially against obfuscation. It rigorously demonstrates that while advanced models handle clean code well, they fail spectacularly against real-world adversarial techniques, debunking significant industry hype. This is a foundational benchmark that will define conversations around LLM utility in security.

Watch on YouTube