Try to Poison My Deep Learning Data? Nowhere to Hide Your Trajectory Spectrum!

Yansong Gao

Network and Distributed System Security (NDSS) Symposium 2025 · Day 3 · ML Backdoors

In the rapidly evolving landscape of deep learning, the quality and integrity of training data are paramount, especially for the development of sophisticated models like large language models and other foundation models. However, the acquisition of high-quality, diverse datasets is a significant challenge, often leading organizations to adopt a **data-as-a-service (DaaS)** business model. This model typically involves **data curators** outsourcing data collection and annotation to numerous freelancers or **data contributors**, who may not always be fully trusted. This talk by Yansong Gao presents a critical investigation into the vulnerabilities arising from such untrusted data sources, specifically focusing on **data poisoning attacks** designed to inject **backdoors** into deep learning models.

AI review

Legitimate academic security research with a clear problem framing and a technically coherent solution — loss trajectory + spectral transformation + DBSCAN is a sensible and novel pipeline for the DaaS data-cleansing scenario. The contribution is real, but this is a conference paper presentation, not a security talk, and the gap between 'academically valid' and 'operationally moves the needle' is wide enough to matter.

Watch on YouTube