Paladin: Defending LLM-enabled Phishing Emails with a Trigger-Tag Paradigm

Yan Pang

Network and Distributed System Security (NDSS) Symposium 2026 · Day 1 · Web Security

This talk presents **Paladin**, a proactive defense system against LLM-generated phishing emails that works by embedding **trigger-tag associations** directly into language models before they are released. When a malicious user employs an instrumented model to generate phishing content, the model automatically embeds invisible tags in the output that defenders can detect. This represents a fundamentally different approach from traditional phishing detection: rather than analyzing received emails for linguistic anomalies, Paladin instruments the generation source itself.

AI review

A creative but fundamentally flawed approach to phishing defense that embeds detection tags into LLMs at the source. The trigger-tag paradigm is an interesting concept, but the threat model has a fatal assumption: it only works if attackers use instrumented models. Any attacker who trains from scratch, uses a non-instrumented model, or strips tags post-generation bypasses the entire defense. The Q&A exposed that the model's ability to identify phishing queries depends entirely on training data coverage.

Watch on YouTube