SHAFT: Secure, Handy, Accurate and Fast Transformer Inference

Andes Y. L. Kei

Network and Distributed System Security (NDSS) Symposium 2025 · Day 3 · Privacy & Cryptography 2

The proliferation of **Large Language Models (LLMs)** like ChatGPT has ushered in a new era of AI capabilities, yet it has simultaneously amplified concerns regarding data privacy. When users submit sensitive queries to an LLM, their private information could potentially be exposed to the model owner or third parties. The talk "SHAFT: Secure, Handy, Accurate and Fast Transformer Inference," presented by Andes Y. L. Kei, addresses this critical challenge by introducing a novel framework for **private inference** on **Transformer** models. This allows a client with a private query and a server with a private model to perform inference without either party revealing their sensitive inputs.

AI review

Legitimate academic systems security work from NDSS — not a security conference in the DEF CON/Black Hat sense, so the audience is researchers, not practitioners. SHAFT makes real incremental contributions to private ML inference: the ODE-based Softmax with domain-informed clipping bounds and the one-round Fourier-series GELU are genuinely clever protocol-level ideas that required mathematical depth to produce. But this is squarely incremental work in a crowded field — it's optimization over Sigma and Bumblebee, not a paradigm shift.

Watch on YouTube