Training BrowseSafe: Lessons from Detecting Prompt Injection in Production Browser Agents

Kyle Polley

[un]prompted 2026 — AI Security Practitioner Conference · Day 2 · 1

Perplexity's security team built and open-sourced BrowseSafe, a fine-tuned classifier that detects prompt injection in browser agents with a 90.4% F1 score at sub-second latency — dramatically outperforming general-purpose LLMs. The key lessons: existing academic benchmarks fail in production environments, fine-tuning on domain-specific data beats prompted models, and defense-in-depth with a data flywheel is non-negotiable. ---

AI review

Real production work, real numbers, actual open-sourced artifacts — this is what a defensive AI security talk is supposed to look like. Polley built something, deployed it, hit the wall, learned from the wall, and told you exactly where the wall is. The distractor problem alone is worth the price of admission.

Watch on YouTube