Lightning Talk: Kueue: Save Some QPS for the Rest of Us! How To Manage 100k Updates Pe... P. Bundyra
P. Bundyra
KubeCon + CloudNativeCon Europe 2025 · Lightning Talk
This lightning talk, delivered by P. Bundyra, a Google engineer, introduces **Kueue**, a cloud-native queuing system specifically designed for managing **AI workloads** within Kubernetes clusters. Developed in collaboration with the open-source community since 2022, Kueue acts as a sophisticated "bouncer" for the cluster, intelligently deciding which jobs or pods can run and which must wait, thereby preventing resource contention and ensuring fair access. The core problem addressed in this presentation is the challenge of efficiently exposing the position of a workload in a queue when dealing with potentially tens of thousands of active workloads, without overwhelming the Kubernetes API server or its underlying etcd datastore.
AI review
This lightning talk by P. Bundyra presents a highly effective and technically sound solution to a critical Kubernetes scaling problem: efficiently exposing queue positions for tens of thousands of AI workloads without overloading the API server or etcd. By cleverly leveraging the Kubernetes API Aggregation Layer with a "lazy evaluation" approach, Kueue demonstrates how to bypass CRD limitations for high-frequency, dynamic data. This is a must-see for any Kubernetes developer or architect looking to build scalable and resilient operators, offering deep insights into API internals and advanced…