Scaling GPU Clusters Without Melting Down! - Alay Patel & Ryan Hallisey, NVIDIA

Alay Patel, Ryan Hallisey, NVIDIA

KubeCon + CloudNativeCon Europe 2025 · Session

In an era where Graphics Processing Units (GPUs) are becoming exponentially more powerful, enabling complex AI, machine learning, and high-performance computing workloads, the challenge of scaling the underlying infrastructure has intensified. This talk, delivered by NVIDIA’s Alay Patel and Ryan Hallisey at KubeCon EU, delves into the critical and often overlooked aspect of maintaining the stability and performance of the Kubernetes control plane when scaling GPU clusters. The speakers share NVIDIA's firsthand experiences and the architectural lessons learned from operating bare metal Kubernetes environments under extreme load.

AI review

This isn't some 'AI-powered' marketing drivel. This is NVIDIA, showing how they nearly melted down their Kubernetes control plane scaling GPU clusters and what real, technical solutions they engineered. From taming API server OOMs with APF and aggressive Go GC tuning, to elegantly solving connection skew with `goaway-chance`, and proving out DRA's future, this talk delivers substance. It's a masterclass in operational deep-diving, offering actionable insights for anyone running Kubernetes at scale. No bullshit, just hard-won engineering.

Watch on YouTube