COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts

Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao, Xin Liu

Conference on Machine Learning and Systems 2025 · Day 4 · Session 9: Parallel and Distributed Systems

This talk introduces **Comet**, a novel framework designed to achieve fine-grained computation-communication overlapping in **Mixture-of-Experts (MoE)** models. Presented by Ningxin Zheng from Bytedance, Comet addresses a critical bottleneck in the efficient distributed training and inference of large-scale MoE architectures: the significant latency introduced by communication operations. As MoE models become increasingly prevalent in cutting-edge machine learning applications, optimizing their distributed execution is paramount for maximizing hardware utilization and reducing operational costs.

AI review

Comet is legitimately interesting systems work — fine-grained computation-communication overlapping for MoE layers with real production validation at scale. The shared tensor dependency model and adaptive thread block specialization are genuine technical contributions, not marketing. But the article summary leaves too much on the table: no code, no open-source link, no reproducible benchmark setup beyond 'eight A100s and Megatron,' and the offline profiling story is hand-waved when it's actually the hardest operational part. Engineers working on MoE infrastructure will find useful framing…