What Lies Beneath the Surface? Evaluating LLMs for Offensive Cyber Capabilities

Unknown

Black Hat USA 2024 · Day 1 · Briefing

The rapid proliferation and increasing sophistication of large language models (LLMs) have sparked critical questions within the cybersecurity community: To what extent do these models possess offensive cyber capabilities, and should defenders be concerned? This talk, delivered by a multidisciplinary team from MITRE, addresses this complex challenge head-on by presenting novel methodologies and tools for scientifically evaluating LLMs' potential as cyber threats. The speakers, Michael Peretti, Murza Daughter, and Alex Burn, alongside their broader team, highlight the current lack of comprehensive, metric-driven assessments and introduce three distinct testing frameworks designed to provide a clearer, more quantifiable understanding of LLM capabilities across various offensive cyber domains.

Watch on YouTube