PrivCode: When Code Generation Meets Differential Privacy

Zheng Liu

Network and Distributed System Security (NDSS) Symposium 2026 · Day 1 · Trusted Execution

Fine-tuning large language models on proprietary or sensitive code datasets enables powerful domain-specific code generation, but it also creates **privacy risks** -- models can memorize and reproduce sensitive code, including hardcoded credentials, API keys, email addresses, and personally identifiable information (PII) from training data. This talk presents **PrivCode**, a two-stage framework for **differentially private code synthesis** that generates high-quality synthetic code while providing **zero PII leakage**, even when canary data is repeated 100 times in the training set.

AI review

A differentially private code generation framework that achieves zero PII leakage while maintaining competitive code quality. Solid ML engineering for the privacy-preserving AI space, but there's no offensive security content, no novel attack, and the canary extraction threat model is well-established. This is a defense paper for the ML privacy community.

Watch on YouTube