Universal and Context-Independent Triggers for Precise Control of LLM Outputs

Black Hat USA 2025 · Day 1 · Briefings

Researchers from Tencent Xuanwu Lab developed "universal adversarial triggers" — short, model-specific token sequences that, when injected into any prompt, force an LLM to output exactly what an attacker specifies, regardless of the surrounding context. The technique achieves roughly 70% success across diverse prompts and payloads, works against open-source models including Qwen, Llama, and DeepSeek Small, and was demonstrated achieving remote code execution against AI coding agents — no knowledge of the application's system prompt required. ---

AI review

Tencent Xuanwu brought the math. GCG plus HotFlip applied to adversarial token optimization for precise LLM output control is real ML security research — not prompt engineering dressed up with acronyms. The MCP supply chain vector and the 70% ASR across open-source models makes this immediately actionable for AI agent defenders.

Watch on YouTube