Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers

Francesco Daghero, Daniele Jahier Pagliari, Francesco Conti, Luca Benini, Alessio Burrello

Conference on Machine Learning and Systems 2025 · Day 3 · Session 7: Quantization and Sparsity

This talk, presented by Francesco Daghero and his colleagues, delves into critical advancements for deploying deep learning models on highly constrained **microcontrollers (MCUs)**. The core challenge addressed is the inherent computational and memory intensity of deep neural networks (DNNs) when targeting ultra-low-power platforms operating within power envelopes of mere tens of milliwatts. The research introduces a dual-pronged approach: developing highly optimized software-only kernels and an exceptionally lightweight hardware extension, both specifically designed to accelerate **N:M semi-structured pruning** for common sparse operators like convolutions and fully connected layers.

AI review

Solid, real systems engineering work on sparse DNN inference for RISC-V MCUs. The team clearly built and measured the thing they're describing — custom ISA extension, compiler integration, layer-by-layer benchmarks — and the area-efficiency story (4.3x speedup at 5% core area overhead vs. competitors at 44%) is a genuinely useful data point for anyone designing edge AI silicon. Docked one star primarily because the work is tightly bound to the PULP architecture and a custom compiler toolchain, which limits how many engineers can act on it today without significant porting work.