Challenges of and Solutions for Migrating Spark From Legacy Hadoop Clu... Neha Singla & Rasik Pandey
Neha Singla, Rasik Pandey
KubeCon + CloudNativeCon Europe 2025 · Session
This talk, presented by Neha Singla and Rasik Pandey from Apple, delves into the intricate journey of migrating large-scale Apache Spark workloads from traditional bare-metal Hadoop clusters to a modern Kubernetes-based infrastructure. The presentation meticulously details the architectural evolution, the myriad challenges encountered at each stage, and the innovative solutions implemented to achieve a highly efficient, cost-effective, and interactive Spark environment. Given Apple's immense scale and sophisticated data engineering needs, their experiences offer invaluable insights for organizations grappling with similar migrations, particularly concerning the unique demands of interactive Spark applications.
AI review
This talk from Apple details a highly valuable and technically deep journey of migrating large-scale interactive Spark workloads from legacy Hadoop to Kubernetes, specifically addressing the critical limitations of `kube-scheduler` for such applications. The speakers present a comprehensive architectural evolution, highlighting the indispensable role of a specialized scheduler like YuniKorn for achieving optimal performance, resource utilization, and user experience. The insights, including a custom Jupyter Lab plugin for kernel configuration and a sophisticated node pool strategy, provide a…