BAFFLE: Hiding Backdoors in Offline Reinforcement Learning Datasets

Chen Gong, Zhou Yang, Yunpeng Bai, Jieke Shi, Junda He, Kecen Li

IEEE Symposium on Security and Privacy 2024 · Day 2 · Continental Ballroom 5

This talk, presented by Jun from Singapore Management University, introduces BAFFLE, a novel method for embedding backdoors into **offline reinforcement learning (RL)** datasets. The research, a collaborative effort involving institutions like the University of Virginia, Chinese Academy of Sciences, Rutgers University, and North Carolina State University, highlights a critical security vulnerability in the rapidly expanding field of offline RL. As deep reinforcement learning shifts from online interaction to learning from static, pre-collected datasets, the integrity of these datasets becomes paramount. The talk demonstrates that malicious actors can poison these datasets to compromise the behavior of trained RL agents, even under seemingly normal operating conditions, and evade common detection mechanisms.

AI review

This research unveils a critical and novel backdoor attack, BAFFLE, targeting offline reinforcement learning datasets. It demonstrates how attackers can stealthily inject misleading experiences, causing trained agents to catastrophically fail under specific triggers while evading current detection methods. This is a must-see for anyone in RL, exposing a significant and unaddressed vulnerability in a rapidly expanding field.

Watch on YouTube