Demystifying File Similarity for Malware Detection
Udbhav Prasad
BSidesSF 2026 · Day 1 · AMC Theatre 04
In an era where malware sophistication is rapidly escalating and adversaries can generate myriad polymorphic variants with ease, the ability to accurately and efficiently identify similar files is paramount for robust cybersecurity defenses. Udbhav Prasad's talk, "Demystifying File Similarity for Malware Detection," delves into the intricate world of file similarity algorithms, tracing their evolution from rudimentary bitwise comparisons to advanced machine learning approaches. The presentation meticulously dissects the challenges inherent in detecting subtly modified malware, especially given the immense scale of modern enterprise networks that can host billions of unique files.
AI review
A competent survey talk on file similarity algorithms for malware detection — SSDeep vs SDHash vs TLSH vs XGBoost vs DNN embeddings — with actual experimental results on the Ember dataset. Solid foundational content, honest about limitations, but this is ultimately a well-executed tutorial on techniques the security community has been discussing for years, not novel research.