Preference Poisoning Attacks on Reward Model Learning
Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik
IEEE Symposium on Security and Privacy 2025 · Day 2 · ML Attacks
Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik
IEEE Symposium on Security and Privacy 2025 · Day 2 · ML Attacks