Beyond Classification: Inferring Function Names in Stripped Binaries via Domain Adapted LLMs

Linxi Jiang

Network and Distributed System Security (NDSS) Symposium 2025 · Day 3 · Binary Analysis

The ability to accurately infer function names in **stripped binaries** is a critical challenge in reverse engineering, with profound implications for fields such as **malware analysis**, vulnerability research, and proprietary software understanding. Without meaningful function names, reverse engineers are confronted with a deluge of generic labels (e.g., `sub_401000`), severely hindering their comprehension of a program's logic and purpose. This talk by Linxi Jiang from Ohio State University introduces `SimGen`, a novel framework that leverages **domain-adapted large language models (LLMs)** to address this persistent problem, moving beyond traditional classification-based approaches that have shown significant limitations in real-world scenarios.

AI review

Legitimate academic research with a real contribution — rigorous deduplication methodology that exposes inflated benchmarks, plus a working LoRA-based pipeline that actually runs on a single GPU. Solid NDSS paper material, but this is a conference talk, not a practitioner's tool, and the gap between 'we got better F1 on 33 open-source C projects' and 'this changes how you reverse malware tomorrow' is wider than the speaker acknowledges.

Watch on YouTube