JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation

Name: JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
Duration: 15 min
Description: Talk from 34th USENIX Security Symposium (USENIX Security '25).

Shenyi Zhang

34th USENIX Security Symposium (USENIX Security '25) · Day 3 · Vulnerabilities in LLMs: Privacy, Safety, and Defense