Pushing the Limits of Prometheus at Etsy - Chris Leavoy, Etsy & Bryan Boreham, Grafana Labs

Chris Leavoy, Etsy, Bryan Boreham, Grafana Labs

KubeCon + CloudNativeCon Europe 2025 · Session

This article delves into the intricate challenges and innovative solutions Etsy encountered while operating one of the largest single Prometheus servers in the industry. Presented by Chris Leavoy from Etsy and Bryan Boreham from Grafana Labs, the talk provides a deep dive into the complexities of scaling Prometheus vertically to its absolute limits, revealing unexpected bottlenecks and offering practical strategies for optimization. The speakers share invaluable lessons learned from dealing with millions of metrics, high churn environments, and the critical need for robust observability during peak periods like Black Friday.

AI review

This talk from Etsy and Grafana Labs isn't just another Prometheus scaling story; it's a brutal, honest, and deeply technical dive into the absolute limits of vertical scaling. They didn't just throw more hardware at the problem; they profiled Go runtime, uncovered subtle cloud provider network limitations, and re-engineered core Prometheus TSDB compaction and remote write logic. The findings are not just academic; they saved Etsy's Black Friday, led to upstream contributions, and provide actionable, low-level insights that challenge conventional wisdom. This is exactly the kind of deep…

Watch on YouTube