Enhancing Secret Detection in Cybersecurity with Semantic Analysis

Danny Lazarev, Erez Harush

BSidesSF 2025 — Here Be Dragons · Day 1 · Main

Regex-based secret detection generates too many false positives, misses context-dependent secrets, and can't keep pace with the explosion of new API integrations. Researchers Danny Lazarev and Erez Harush from Wiz describe how they fine-tuned a small language model (SLM) using a multi-agent LLM pipeline to achieve 86% precision and 80% recall on generic secret detection — running in under 10 seconds per file on a single-threaded CPU machine, at a fraction of the cost and privacy risk of large language model alternatives. ---

AI review

Wiz's Lazarev and Harush did the actual work: 100,000-file labeled dataset, LoRA fine-tuning on a Qwen model, 86% precision and 80% recall on generic secrets at under 10 seconds per file on CPU, beating a regex baseline of 56% recall and 32% precision. The engineering rigor is real, the numbers are specific, and the approach is reproducible. This is what applied ML security research looks like when it's done right.

Watch on YouTube