Breaking BOTS: Cheat Blue Team CTFs by Building AI Agents That Investigate

Leo Meyerovich

BSides Seattle 2026 · Day 2 · Track 1

Leo from Graphistry presented a deeply practical talk on using AI agents to solve blue team CTFs — specifically Splunk's Boss of the SOC (BOTS) — and what that tells us about the future of AI-assisted security investigations. The talk progresses from a simple "just throw Claude at it" baseline that scores 56% with zero prompt engineering, through systematic prompt engineering and evaluation-driven development, to a breakthrough approach that achieves 100% on the BOTS competition by having the AI conduct a full incident response investigation before seeing any questions.

AI review

A methodologically rigorous talk that takes AI-assisted security investigation from viral demo to reproducible engineering. The progression from 56% raw Claude Opus 4.5 baseline through prompt engineering to 100% via pre-investigation IR flow is backed by real benchmarks on Splunk BOTS. The compound accuracy math (70% per step = 3% over 10 steps) is the single most important frame for understanding why AI investigation tools fail in production, and the eval-driven development methodology with OpenTelemetry feedback is the correct engineering approach.

Watch on YouTube