Highly Available Prometheus Metrics for Distributed SQL with Thanos on GKE
In the last few years, Prometheus has gained huge popularity as a tool for monitoring distributed systems. It has a simple yet powerful data model and query language, however, it can often pose a bit of a challenge when it comes to high availability as well as for historical metric data storage. Adding more Prometheus replicas can be used to improve availability, but otherwise, Prometheus does not offer continuous availability. For example, if one of the Prometheus replicas crashes,
…