Taming 50 Billion Time Series: Operating Global-Scale Prometheus Dep... Orcun Berkem & Alan Protasio
Orcun Berkem, Alan Protasio
KubeCon + CloudNativeCon Europe 2025 · Session
This talk, presented by Orcun Berkem and Alan Protasio from AWS, delves into the intricate challenges and innovative solutions involved in operating a global-scale, **Prometheus**-compatible monitoring service on **Kubernetes**. Specifically, it details the architectural evolution and operational strategies behind AWS Managed Prometheus, a service designed to alleviate the operational burden of managing Prometheus infrastructure for users. The speakers, both deeply involved in open-source observability, highlight how they leverage and contribute to the **Cortex** project, an open-source, scalable, and highly available Prometheus-compatible system.
AI review
This talk delivers a masterclass in operating a global-scale Prometheus-compatible monitoring service. The speakers, deeply involved in the Cortex project and AWS Managed Prometheus, meticulously detail the architectural evolution from Prometheus's inherent limitations to a sophisticated cellular architecture. They present novel solutions for multi-tenancy, blast radius reduction, deployment safety, and proactive failure detection, all while leveraging and contributing to open-source projects. This is a rare look into the real-world engineering challenges and solutions behind a critical…