Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution

Shuo Shao

Network and Distributed System Security (NDSS) Symposium 2025 · Day 1 · AI Safety

In an era increasingly shaped by artificial intelligence, the intellectual property embedded within high-performing **deep neural networks (DNNs)** has become an invaluable asset. Training these sophisticated models demands significant investment in data collection, computational resources, and expert labor, making them prime targets for unauthorized commercialization, redistribution, and outright theft. This talk, presented by Shuo Shao from University, introduces "Explanation as a Watermark" (EAW), a novel paradigm designed to safeguard the copyright of these complex AI models. EAW proposes a **black-box model watermarking** technique that leverages the inherent interpretability of a model's predictions to embed robust, multi-bit watermarks without compromising the model's primary function or introducing exploitable vulnerabilities.

AI review

Competent academic security research with a genuine contribution: moving model watermarking out of the backdoor paradigm and into the explanation space is a real idea worth examining. The technical construction is sound and the multi-bit capacity over label-only APIs is the strongest practical hook, but this is a niche subfield of ML security that most practitioners won't touch, and the work won't redefine how anyone defends anything tomorrow.

Watch on YouTube