Trajectory-Aware Post-Training of Open-Weight Models for Security Agents

Aaron Brown, Madhur Prashant

[un]prompted 2026 — AI Security Practitioner Conference · Day 1 · 2

Frontier models score 80% on isolated cybersecurity tasks but 0% on multi-stage operations. Aaron Brown of AWS released an open-source training gymnasium at the conference — Open Trajectory Gym — that combines supervised fine-tuning on expert traces with online reinforcement learning to produce smaller, specialized security models that outperform their base counterparts on real penetration testing benchmarks. ---

AI review

Brown dropped an open-source training gymnasium at the conference — one hour before his talk — and walked through exactly how he got a 27B model from 12.5% to meaningfully better performance on multi-stage pentesting benchmarks in under a day. This is practitioners building the tools the field needs. The composition gap data is damning and the open release is generous.

Watch on YouTube