Exploiting Shadow Data from AI Models and Embeddings

Patrick Walsh

DEF CON 33 · Day 1 · Main Stage

Patrick Walsh, CEO of Iron Core Labs, delivered a compelling talk at DEF CON, "Exploiting Shadow Data from AI Models and Embeddings," shedding light on the alarming ease with which sensitive data can be extracted from AI systems. The presentation systematically deconstructs common misconceptions about data privacy in AI, particularly challenging the notion that once data is absorbed into a model, it becomes an untraceable amalgamation. Walsh demonstrates through various proofs-of-concept that private information, ranging from personal identifiers to financial details, can be retrieved from fine-tuned models, Retrieval Augmented Generation (RAG) contexts, and even raw vector embeddings.

AI review

Walsh covers real ground — the vector inversion demo and the fine-tuned model leakage proof-of-concept are legitimate contributions that push back on vendor hand-waving about embeddings being 'like hashes.' The problem is the talk sits awkwardly between a research drop and a product pitch, and the technical bar for most of the content doesn't match the DEF CON stage.

Watch on YouTube