On Path to Multimodal Generalist: General-Level and General-Bench

Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Weiming Wu, Tianjie Ju, Zixiang Meng, Shilin Xu, Liyu Jia, Wentao Hu, Meng Luo, Jiebo Luo, Tat-Seng Chua, Shuicheng YAN, Hanwang Zhang

International Conference on Machine Learning 2025 · Oral

This talk, presented by Tianjie Ju and a large team of co-authors from numerous institutions, introduces a novel framework for evaluating general-purpose multimodal foundation models. Titled "On Path to Multimodal Generalist: General-Level and General-Bench," the presentation addresses a critical gap in the rapidly evolving field of multimodal AI: the lack of a reliable, long-term benchmark capable of truly assessing the generalization and synergy capabilities of these complex models. The core contribution lies in two interconnected components: **General-Level**, a five-tier system designed to quantify multimodal intelligence, and **General-Bench**, a massive, comprehensive benchmark dataset built to facilitate this evaluation.

AI review

This talk introduces General-Level, a five-tier evaluation hierarchy for multimodal foundation models, and General-Bench, a large-scale benchmark covering 700+ tasks across multiple modalities. The ambition is real and the community need is genuine — current evaluations are narrow and reward specialization over generalization. But the core theoretical apparatus is thin. The central notion of 'synergy' is operationalized in a way that collapses the concept rather than formalizes it, the tiered system conflates distinct properties without principled justification, and the benchmark…