PhishLang: A Real-Time, Fully Client-Side Phishing Detection Framework Using MobileBERT

Sayak Saha Roy

Network and Distributed System Security (NDSS) Symposium 2026 · Day 3 · Web Security

PhishLang is a **lightweight, fully client-side phishing detection framework** that uses **MobileBERT** to analyze website source code and detect phishing intent without relying on handcrafted features or server-side infrastructure. The framework parses HTML into structured tag-based representations, capturing only the elements most relevant to phishing behavior, and uses the language model to understand the collective "story" told by these elements.

AI review

A practical, well-engineered phishing detection tool that delivers real results: 42 million domains scanned, 26K detections, 91% zero-day catch rate. The MobileBERT approach is genuinely lightweight enough for client-side deployment, and the finding that blocklists miss over half of reported samples is damning. However, the evasion surface is significant -- fully dynamic rendering, encoded content, and non-standard HTML all create blind spots -- and the adversarial evaluation needed more depth.

Watch on YouTube