Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Tiansheng Wen, Yifei Wang, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, Chenyu You

International Conference on Machine Learning 2025 · Oral

This article delves into the ICML 2025 talk, "Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation," presented by Chenyu You, an Assistant Professor at Stony Brook, on behalf of his students and collaborators. The talk introduces **CSR (Compact Sparse Representation)**, a novel approach to adaptive representation learning that re-examines the classical concept of sparse coding to address the challenges of modern large-scale retrieval systems. In an era dominated by trillion-scale databases powering real-time applications like web search, recommendation engines, and video retrieval, the demand for highly efficient yet accurate similarity search is paramount.

AI review

CSR is a competent, practically motivated contribution that revisits sparse coding as an alternative to Matryoshka Representation Learning for large-scale retrieval. The core idea — project frozen dense embeddings into a high-dimensional sparse space via a lightweight MLP, then exploit sparse matrix operations for fast similarity search — is coherent and the empirical results are plausible. However, the work reads primarily as an engineering contribution with theoretical claims that are either underdeveloped or absent. The 'sparse contrastive loss' and 'dead neuron' mitigation are presented…