DP-fy your DATA: How to (and why) synthesize Differentially Private Synthetic Data: Techniques for DP synthetic IMAGE data creation

Natalia Ponomareva, Sergei Vassilvitskii, Peter Kairouz, Alex Bie

International Conference on Machine Learning 2025 · Tutorial

This talk, presented by Alex Bie and his colleagues Natalia Ponomareva, Sergei Vassilvitskii, and Peter Kairouz from Google, delves into the critical and evolving field of generating differentially private (DP) synthetic data, specifically focusing on text. The core objective is to enable the use of sensitive, private user data to enhance machine learning models and downstream tasks without directly exposing the original private information. This is achieved by first synthesizing a dataset with strong differential privacy guarantees, which can then be used to train conventional ML models as if it were public data.

AI review

A competent practitioner-oriented tutorial on DP synthetic data generation methods, organized around a sensible taxonomy, but presenting no new theorems, no new algorithms, and no rigorous experimental comparisons. The 'unified view' framing is reasonable pedagogy but not a theoretical contribution. The empirical results are illustrative rather than controlled. This is a workshop-style survey talk, not a research contribution, and should be evaluated accordingly.