Preference Poisoning Attacks on Reward Model Learning

Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik

IEEE Symposium on Security and Privacy 2025 · Day 2 · ML Attacks