Faux Data, Real Defense: ML advancements in data synthesis

Arjun Chakraborty

BSidesSF 2024 · Day 1

Arjun Chakraborty, a member of the detection engineering team at Databricks, presented a compelling talk titled "Faux Data, Real Defense: ML advancements in data synthesis" at BSidesSF 2024. The presentation delved into the critical challenge faced by security professionals: the difficulty in obtaining representative adversarial data to build and tune effective threat detection pipelines. Chakraborty proposed a novel approach leveraging **Large Language Models (LLMs)** to generate synthetic adversarial data, thereby empowering detection engineers to create more robust and efficient security mechanisms.

AI review

This talk presents a practical approach to a persistent problem in threat detection: the scarcity of representative adversarial data. By leveraging LLMs to generate synthetic Kubernetes audit logs and, crucially, developing a three-pronged evaluation framework (Fidelity, Reproducibility, Accuracy), the speaker offers a valuable tool for detection engineers. While not a replacement for red teaming, it provides a significant step towards more robust detection pipelines.

Watch on YouTube