Chronos: Finding Timeout Bugs in Practical Distributed Systems by Deep-Priority Fuzzing with Transient Delay
Yuanliang Chen, Fuchen Ma, Yuanhang Zhou, Ming Gu, Qing Liao, Yu Jiang
IEEE Symposium on Security and Privacy 2024 · Day 2 · Continental Ballroom 4
Distributed systems, the backbone of modern computing, are inherently complex and susceptible to various runtime faults. Among these, unexpected delays – stemming from network traffic, resource contention, or software bugs – pose a significant challenge. To maintain stability and reliability, these systems rely heavily on **timeout mechanisms**, allowing components to gracefully exit waiting states and take corrective actions like retries or skips when expected responses are not received within a set duration. However, the sheer complexity and intricate interactions within these systems make the implementation of timeout logic a fertile ground for subtle yet critical bugs.
AI review
Kronos presents a highly effective and novel approach to detecting elusive timeout bugs in critical distributed systems. Its deep-priority fuzzing and ingenious transient delay mechanism significantly advance the state of fault injection, revealing 28 new vulnerabilities across widely used software with unprecedented efficiency.