How not to do ML: Showing the Negative Impact of Improper CVE Feature Selection in a Live Exploit Prediction Model

François Labrèche

NorthSec 2025 · Day 2 · Ville-Marie

A machine learning model that scores 93% accuracy and 83% recall on historical CVE data can drop to 2% recall the moment it goes live. François Labrèche of Sophos describes exactly how that happened to an in-production exploit prediction model — and identifies four distinct encoding errors that caused future information to leak into training data, inflating historical metrics while completely destroying live performance. The talk is a rigorous, self-critical post-mortem that every security ML practitioner should read before shipping a vulnerability prioritization model. ---

AI review

Sophos threat prioritization lead documents how a CVE exploit prediction model achieving 93% accuracy and 83% recall in cross-validation dropped to 2% recall in production — then systematically isolates four data leakage mechanisms (date features, random instead of time-series splits, cumulative online discussion signals, LDA trained on future data) and quantifies each one's contribution to the collapse.

Watch on YouTube