MalwareDB: An Open-Source Bookkeeping System for Malicious and Benign Files

Richard Zak

ShmooCon XX (Final) · Day 1 · One Track Mind

Richard Zak's ShmooCon talk introduced **MalwareDB**, an ambitious open-source project designed to address a pervasive challenge faced by malware analysts, incident responders, and machine learning researchers: the efficient storage, management, and contextualization of vast collections of malicious and benign files. Born out of the speaker's personal experience dealing with "obscene amounts of malware" – often terabytes across millions of files – MalwareDB aims to provide a robust, flexible, and interoperable solution for what Zak terms "bookkeeping" in the realm of cybersecurity samples. The project, initiated during the pandemic, fills a critical gap in the existing tool landscape, offering a centralized system to track not just the files themselves, but also their rich metadata, extracted features, and relationships.

AI review

This talk presents MalwareDB, a meticulously engineered open-source system designed to tackle the pervasive problem of managing vast malware and goodware datasets. The speaker, clearly having done the deep technical work, demonstrates a robust solution built in Rust with a Postgres backend, leveraging advanced features like fuzzy hashing via database extensions and optional encryption. The project offers significant practical impact for researchers and analysts by providing a self-hosted, extensible framework for organizing samples, extracting features, and laying the groundwork for future…

Watch on YouTube