The Hidden Bloat in Machine Learning Systems

Huaifeng Zhang, Ahmed Ali-Eldin

Conference on Machine Learning and Systems 2025 · Day 2 · Session 4: Reliable and Scalable Systems

The proliferation of machine learning (ML) frameworks like PyTorch and TensorFlow has driven rapid innovation, but this growth comes with an often-overlooked cost: **software bloat**. This talk, presented by Huaifeng Zhang and his supervisor Ahmed Ali-Eldin from Chalmers University, addresses the critical issue of unnecessary code within these frameworks, particularly focusing on their shared libraries. Drawing an analogy to Michelangelo's David, where the artist "removed everything that is not David," the presenters introduce their novel approach, **Negativia Amal**, which aims to de-bloat ML frameworks by intelligently removing unneeded code from shared libraries.

AI review

Legitimate systems research with real numbers and a clear mechanism, but the article summary leaves too many implementation gaps to act on. The core finding — that 80% of GPU code bloat comes from architecture mismatches in bundled cubins — is genuinely interesting and underreported. The CUDA API hooking approach is clever. But the write-up reads more like a conference abstract than an engineering guide, and without the actual tool available to test, this is a 'interesting, I'd like to follow up' rather than 'I know what to do next week.'