Simple Machine Learning Techniques for Binary Diffing (in Diaphora)

Joxean Koret

44CON 2024 · Day 2 · Main

Joxean Koret's presentation at 44CON delves into the practical application of machine learning (ML) techniques to **binary diffing**, specifically within his open-source tool, **Diaphora**. Binary diffing, a cornerstone of reverse engineering, involves comparing two binary files to identify identical or similar functions, often across different versions, architectures, or compilation settings. While academia frequently explores ML for binary similarity analysis, Koret highlights a significant gap: the lack of adoption of these advanced techniques in mainstream industry tools like Diaphora, Bindiff, or Ghidra diff.

AI review

Koret brings genuine practitioner credibility to a space drowning in academic vaporware — he built the damn tool, ran the experiments, hit the walls, and shows you exactly what broke and why. The specialized-model insight is legitimately useful, and the honest accounting of failure modes (local model redundancy, gigantic-model resource hell) is rarer and more valuable than another 'our model beats BinDiff' paper.

Watch on YouTube