SSL-WM: A Black-Box Watermarking Approach for Encoders Pre-trained by Self-Supervised Learning

Peizhuo Lv

Network and Distributed System Security (NDSS) Symposium 2024 · Day 3 · Privacy-Preserving ML

Self-Supervised Learning (SSL) has revolutionized fields like Computer Vision (CV) and Natural Language Processing (NLP), enabling the creation of powerful, general-purpose encoders capable of extracting robust feature representations from unlabeled data. However, the immense computational and financial investment required to train these state-of-the-art SSL models – exemplified by OpenAI's CLIP (432 hours on 592 V100 GPUs) and GPT-3 (costing $12 million) – makes them highly attractive targets for intellectual property (IP) theft. Attackers can steal these valuable pre-trained encoders and commercialize them for their own profit, leading to significant economic losses for the original owners. This talk introduces **SSL-WM**, a novel black-box watermarking solution designed to address this critical problem.