Mixture of Lookup Experts

Shibo Jie, Yehui Tang, Kai Han, Yitong Li, Duyu Tang, Zhi-Hong Deng, Yunhe Wang

International Conference on Machine Learning 2025 · Oral

This presentation introduces **Mixture of Lookup Experts (MoLA)**, a novel architectural design aimed at making large language models (LLMs) more friendly for deployment on edge devices such as mobile phones and personal computers. The talk, delivered on behalf of primary author Shibo Jie from Peking University, addresses the significant challenges faced by conventional **Mixture of Experts (MoE)** models in VRAM- and computation-constrained edge scenarios. MoE models, while efficient in reducing computation by activating only a subset of experts per token, suffer from large parameter sizes that often exceed the memory capacity of single-GPU edge devices.

AI review

MoLA is a competent systems-oriented architecture paper that proposes replacing FFN experts in MoE models with precomputed lookup tables indexed by token embeddings, targeting edge deployment under VRAM constraints. The core idea is clear and practically motivated, and the engineering trade-off — more storage, far less loading bandwidth — is real. The work is honest about its scope and the ablation study is informative. However, this is fundamentally a systems efficiency paper dressed lightly in MoE theory language, not a theoretical contribution. The central insight (precompute f(x) when…