Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection

Zhijing Wan, Zhixiang Wang, Zheng Wang, Xin Xu, Shin'ichi Satoh

International Conference on Machine Learning 2025 · Oral

In an era defined by the exponential growth of data, the deep learning paradigm has witnessed unprecedented advancements. However, the prevailing wisdom that "more data equals better performance" is increasingly being challenged. This presentation, delivered on behalf of Zhijing Wan from Wuhan University, delves into this critical issue, exploring the diminishing returns of endlessly expanding datasets due to escalating costs, redundancy, noise, and data imbalance. The core of the talk centers on **subset selection**, an elegant strategy designed to identify the most informative samples within a large dataset, thereby enabling efficient model training without compromising performance.

AI review

This paper conducts an empirical study on foundation models as information extractors for one-shot subset selection, finding that FMs outperform traditional extractors on fine-grained data but not coarse-grained data, and proposes RAM-APL, a multi-FM scoring method combining rank aggregation for intra-class representativeness with pseudo-accuracy loss for inter-class ambiguity. The empirical observations are reasonable and the problem framing is coherent, but the theoretical grounding is essentially absent, the proposed method lacks formal justification for why these two components should…