Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Tiansheng Wen, Yifei Wang, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, Chenyu You
International Conference on Machine Learning 2025 · Oral
This article delves into the ICML 2025 talk, "Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation," presented by Chenyu You, an Assistant Professor at Stony Brook, on behalf of his students and collaborators. The talk introduces **CSR (Compact Sparse Representation)**, a novel approach to adaptive representation learning that re-examines the classical concept of sparse coding to address the challenges of modern large-scale retrieval systems. In an era dominated by trillion-scale databases powering real-time applications like web search, recommendation engines, and video retrieval, the demand for highly efficient yet accurate similarity search is paramount.
AI review
CSR is a competent, practically motivated contribution that revisits sparse coding as an alternative to Matryoshka Representation Learning for large-scale retrieval. The core idea — project frozen dense embeddings into a high-dimensional sparse space via a lightweight MLP, then exploit sparse matrix operations for fast similarity search — is coherent and the empirical results are plausible. However, the work reads primarily as an engineering contribution with theoretical claims that are either underdeveloped or absent. The 'sparse contrastive loss' and 'dead neuron' mitigation are presented…