Data Processing Efficiency: Optimizing Batch Workloads on Kubernetes With Custom... Hichem Kenniche

Hichem Kenniche

KubeCon + CloudNativeCon Europe 2025 · Session

In this insightful KubeCon EU talk, Hichem Kenniche, a seasoned data and machine learning engineer, delves into the critical challenge of efficiently running batch workloads like **ETL (Extract, Transform, Load)**, **ELT (Extract, Load, Transform)**, and **ML (Machine Learning) training and serving** at scale on Kubernetes. Kenniche highlights a fundamental mismatch: while Kubernetes was initially designed for stateless applications, the data engineering and ML communities, migrating from older ecosystems like Hadoop, increasingly rely on it for stateful, resource-intensive batch processing. This talk argues that the default Kubernetes scheduler, while a marvel of engineering for its intended purpose, falls short in addressing the unique requirements of these batch workloads, particularly in multi-tenant environments with finite resources.

AI review

This talk by Hichem Kenniche is a brutally honest, no-nonsense deep dive into the critical need for custom Kubernetes schedulers like Volcano and YuniKorn for efficient batch processing in data and ML workloads. Kenniche, a practitioner who clearly did the work, cuts through the typical KubeCon fluff to expose the fundamental shortcomings of the default scheduler and provides real-world, actionable insights from his team's extensive testing. He details the tangible benefits in resource utilization, cost reduction, and even carbon footprint, backing his claims with practical observations…

Watch on YouTube