vSim: Semantics-Aware Value Extraction for Efficient Binary Code Similarity Analysis

Huaijin Wang

Network and Distributed System Security (NDSS) Symposium 2026 · Day 3 · Systems Security

Binary code similarity analysis -- searching a database for functions similar to a given binary -- is fundamental for vulnerability detection, malware classification, and patch analysis. This talk presents **vSim**, a value-based approach that identifies **semantics-aware values** by filtering noise (memory addresses, architecture-specific artifacts) and normalizing/concretizing symbolic values for efficient comparison. Unlike ML-based approaches that lack interpretability and robustness to unseen compilation environments, or prior value-based approaches that include semantic-irrelevant noise or miss important intermediate values, vSim captures the "Goldilocks" set of values that reflect program semantics.

AI review

A clean contribution to binary similarity analysis that addresses the known limitations of both ML-based and prior value-based approaches. The semantics-aware value extraction and concretization (replacing theorem provers with concrete sampling) are practical innovations. The fingerprint propagation for inlining is useful. But this is an incremental improvement in a crowded space, with obfuscation explicitly unsupported.

Watch on YouTube