Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Guozheng Ma, Lu Li, Zilin Wang, Li Shen, Pierre-Luc Bacon, Dacheng Tao

International Conference on Machine Learning 2025 · Oral

This talk, presented by Lu Ma from Mila and the University of Montreal, introduces a groundbreaking approach to scaling Deep Reinforcement Learning (Deep RL) models through the strategic application of **static network sparsity**. While neural scaling laws have driven monumental successes in supervised learning, particularly with large language models and vision models, Deep RL has historically struggled to reap similar benefits from increased model size. The conventional wisdom suggested that simply increasing the size of vanilla MLPs in Deep RL often leads to performance degradation due to inherent network pathologies. This work directly challenges that notion, demonstrating that static sparsity is not just a computational optimization but a critical enabler for unlocking the scaling potential of Deep RL, allowing larger models to achieve superior performance by mitigating these very pathologies.

AI review

A competent empirical investigation into whether static network sparsity can mitigate the well-documented scaling failures of deep RL networks. The core finding — that sparse networks continue scaling where dense ones degrade — is real and practically useful. The mechanistic analysis gestures at representation rank, plasticity diagnostics, and gradient covariance, which elevates the work above pure benchmark-chasing. However, the contribution is fundamentally empirical: the 'why' is diagnosed rather than derived, the theoretical grounding is thin, and the sparsity pattern selection is left…