DP-fy your DATA: How to (and why) synthesize Differentially Private Synthetic Data: Methods for DP synthetic TABULAR data
Natalia Ponomareva, Sergei Vassilvitskii, Peter Kairouz, Alex Bie
International Conference on Machine Learning 2025 · Tutorial
This talk, delivered by Natalia Ponomareva, delves into the intricate world of generating Differentially Private (DP) synthetic data, specifically focusing on image and tabular modalities. Building upon previous discussions on the necessity of DP synthetic data and its application to text, Ponomareva provides a comprehensive overview of current methods, their challenges, and their practical implications. The presentation highlights the significant progress made in synthetic data generation, from early Generative Adversarial Networks (GANs) to sophisticated Diffusion Models for images, and from traditional workload-based approaches to emerging Large Language Model (LLM) techniques for tabular data, all while striving to meet rigorous privacy guarantees.
AI review
A competent, well-organized survey talk on DP synthetic data generation covering image and tabular modalities. Ponomareva is clearly a knowledgeable practitioner in this space, and the talk offers genuine pedagogical value — particularly the honest acknowledgment that DP image synthesis remains largely unsolved and that LLM-based tabular methods don't yet reliably beat mature workload-based approaches. However, this is a survey and synthesis talk, not a research contribution. There are no new theorems, no new algorithms, no experiments that weren't already published elsewhere, and the 'key…