Explain How Kubernetes Works With GPU Like I’m 5 - Carlos Santana, AWS

Carlos Santana, AWS

KubeCon + CloudNativeCon Europe 2025 · Session

In this KubeCon EU session, Carlos Santana, a Solutions Architect at AWS and a CNCF Ambassador, demystifies the intricate process of integrating Graphics Processing Units (GPUs) with Kubernetes. Titled "Explain How Kubernetes Works With GPU Like I’m 5," the talk targets a novice audience, guiding them through the fundamental building blocks required to run GPU-accelerated workloads, particularly focusing on Large Language Models (LLMs), within a Kubernetes environment. While the speaker's personal journey began with setting up a home lab using affordable hardware like Nvidia Jetson devices and old gaming PCs, the principles and components discussed are universally applicable to cloud-based Kubernetes deployments, such as AWS EKS.

AI review

This session from Carlos Santana provides a brutally honest and deeply technical breakdown of what it *actually* takes to run GPUs with Kubernetes. Forget the `helm install` magic; Santana meticulously dissects the multi-layered stack from host drivers to Kubernetes device plugins, emphasizing critical version compatibilities and common pitfalls. It’s a foundational masterclass, cutting through vendor hype to deliver actionable knowledge for anyone serious about GPU-accelerated workloads, whether in the cloud or a home lab. It's the kind of direct, substantive teaching that cuts through the…

Watch on YouTube