DUALBREACH: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization
Xinzhe Huang
Network and Distributed System Security (NDSS) Symposium 2026 · Day 2 · Malware & RE
Presented by Johnny on behalf of the authors, DualBreach is a jailbreaking framework designed to bypass **both external guardrails and internal LLM safety alignment** simultaneously -- addressing the dual-defense architecture increasingly deployed in production AI systems. While existing jailbreak methods like **GCG** and **PAP** can bypass either guardrails or alignment individually, they fail against the combined defense. DualBreach achieves a jailbreak success rate of **64-95%** across five mainstream LLMs including GPT-4 (91% ASR), requires only **2.4 average queries per success** (3x more efficient than baselines), and in the most restrictive **one-shot attack** scenario achieves a **97% guardrail bypass rate** against Llama Guard 3 and up to **76% dual jailbreak success** across six safety-aligned models including Claude 3.5. The paper also proposes **E-Guard**, an ensemble defense that reduces attack success rates by up to 25%.
AI review
A well-engineered jailbreak framework that addresses the real-world dual-defense architecture (guardrails + alignment) that single-target attacks fail against. 91% ASR against GPT-4 with 2.4 queries is operationally practical. The push-pull multi-target optimization and proxy guardrail training are technically sound. The one-shot 76% success rate against Claude 3.5 and GPT-4 is particularly impressive. The E-Guard defensive contribution adds balance.