Kubernetes WG Device Management - GPUs, TPUs, NICs and More With DRA - Kevin Klues & Patrick Ohly

Kevin Klues, Patrick Ohly

KubeCon + CloudNativeCon Europe 2025 · Session

This talk provides a comprehensive update on the Kubernetes Working Group Device Management, primarily focusing on the significant advancements and ongoing development of **Dynamic Resource Allocation (DRA)**. Presented by co-chairs Patrick Ohly and Kevin Klues, who were instrumental in initiating DRA within Kubernetes, the session highlights the project's evolution from an early effort to a beta-level feature poised for General Availability (GA). The core mission of DRA is to enable simple, efficient configuration, sharing, and allocation of accelerators and other specialized devices, extending Kubernetes' capabilities beyond traditional microservice management to support modern workloads like AI inference and training.

AI review

This talk provides a critical update on Kubernetes Dynamic Resource Allocation (DRA), a foundational re-architecture of device management. It's not just an incremental feature; it's a necessary evolution for Kubernetes to effectively handle modern, hardware-accelerated workloads like AI/ML and HPC. The speakers, core to the project, detail its rapid maturation to beta, its target for GA, and the introduction of groundbreaking features for granular control, reliability, and multi-node hardware orchestration. This is the real shit that moves the needle for critical infrastructure.

Watch on YouTube