Claude: Climbing a CTF Scoreboard Near You

Keane Lucas

DEF CON 33 · Day 3 · Main Stage

Keane Lucas from Anthropic's Frontier Red Team presented a detailed experimental study of Claude's performance on Capture the Flag (CTF) competitions across a broad range of security categories. The r

AI review

Anthropic Frontier Red Team researcher systematically evaluates Claude's performance across CTF challenge categories, documenting solve rates by difficulty and category, analyzing failure modes, and demonstrating that prompting strategy (agentic loops, uncertainty handling, step decomposition) significantly impacts performance — framed as an early warning for AI-assisted offensive capability trajectory.

Watch on YouTube