SoK: The Good, The Bad, and The Unbalanced: Measuring Structural Limitations of Deepfake Media Datasets

Seth Layton

33rd USENIX Security Symposium · Day 1 · USENIX Security '24

In an era where synthetic media, or **deepfakes**, are becoming increasingly sophisticated and prevalent, the security community faces a critical challenge in accurately detecting them. This talk, "SoK: The Good, The Bad, and The Unbalanced: Measuring Structural Limitations of Deepfake Media Datasets," delivered by Seth Layton at USENIX Security '24, presents a sobering Systematization of Knowledge (SoK) that scrutinizes the foundational elements of deepfake detection research: the datasets and evaluation metrics. Layton argues that the current methodologies, particularly regarding **class distributions** in datasets and the pervasive use of **Equal Error Rate (EER)**, are fundamentally flawed, leading to an overstatement of model performance and actively hindering meaningful progress in the field.

AI review

Layton's SoK delivers a brutal, yet essential, critique of deepfake detection research. By exposing fundamental flaws in dataset construction and metric usage (especially EER), he demonstrates how current models are biased and fundamentally fail in real-world, low-prevalence scenarios. This talk is a necessary wake-up call for the entire deepfake community, forcing a re-evaluation of what constitutes 'effective' detection.

Watch on YouTube