PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training

Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory Ganger, Yida Wang

Conference on Machine Learning and Systems 2025 · Day 2 · Session 2: Parallel and Distributed Systems

The proliferation of large language models (LLMs) has necessitated increasingly sophisticated and scalable training techniques. Among these, **pipeline parallelism** has emerged as a crucial strategy for distributing the immense computational and memory demands of LLMs across multiple GPUs. However, a significant inefficiency inherent to pipeline parallelism is the phenomenon of **pipeline bubbles** – periods during which GPUs sit idle, waiting for data to propagate through the pipeline or for gradient synchronization. This talk introduces **PipeFill**, a novel system designed to reclaim this wasted GPU compute time by intelligently scheduling and executing independent "fill jobs" during these bubble periods. Developed by Daiyaan Arfeen and collaborators at AWS, PipeFill addresses a critical challenge in large-scale LLM training: maximizing GPU utilization to reduce training costs and accelerate development cycles.

AI review

PipeFill is a genuinely interesting systems paper tackling a real inefficiency in pipeline-parallel LLM training. The core idea — greedy partitioning of fill job computation graphs to fit within pipeline bubbles, with adaptive memory knobs — is technically credible and practically motivated. But the article is working from an incomplete transcript, the experimental results section is essentially empty, and the reproducibility bar is nowhere near what you'd want from a talk claiming substantial GPU utilization gains. Worth reading, not must-watch.