Scaling Shopify's Search: Enhancing Elasticsearch Resilience With Kubernetes and KE... Leila Vayghan
Leila Vayghan
KubeCon + CloudNativeCon Europe 2025 · Session
In this insightful talk from KubeCon EU, Leila Vayghan, a Site Reliability Engineer in Shopify’s resiliency organization, detailed a critical project aimed at dramatically improving Elasticsearch indexing performance and infrastructure reliability. The core of the problem addressed was the contention between high-volume, bursty reindexing operations and latency-sensitive real-time indexing, both vying for the same compute and storage resources. This contention led to performance degradation, stale search results for merchants, and significant operational overhead.
AI review
This talk from Shopify's Leila Vayghan presents a highly effective and technically sound solution to a common pain point in large-scale Elasticsearch deployments: resource contention between real-time and reindex workloads. By leveraging dedicated Kubernetes node pools, KEDA for event-driven autoscaling based on a clever custom Prometheus metric (shards per node), and Elasticsearch's native allocation rules, Shopify not only eliminated critical performance bottlenecks and pager fatigue but also achieved substantial infrastructure cost savings. It's a blueprint for any SRE or platform…