Dancing With the Pods: Live Migration of a Database Fleet While Serving... Jayme Bird & Manish Gill
Jayme Bird, Manish Gill
KubeCon + CloudNativeCon Europe 2025 · Session
This technical article delves into the intricate process of performing live, zero-downtime migrations for a large fleet of ClickHouse databases running on Kubernetes. Presented by Jayme Bird and Manish Gill from ClickHouse, the talk at KubeCon EU outlines the journey of transitioning thousands of production clusters from a traditional single-StatefulSet orchestration model to an advanced multi-StatefulSet architecture. This shift was necessitated by the need to implement "make before break" vertical autoscaling, a technique crucial for cloud-native elasticity that current Kubernetes StatefulSets do not natively support for stateful workloads.
AI review
This talk by Jayme Bird and Manish Gill from ClickHouse details a monumental engineering feat: the live, zero-downtime migration of thousands of production ClickHouse clusters from a single-StatefulSet model to an advanced multi-StatefulSet architecture on Kubernetes. It masterfully addresses the inherent limitations of Kubernetes StatefulSets for dynamic, 'make before break' scaling, presenting a robust solution involving custom migration controllers, Temporal for orchestration, and deep, database-specific modifications. This isn't just theory; it's a battle-tested blueprint for achieving…