Skip to content

Relibility: How to Recover from System Downtime? #69

Closed Answered by gagbo
clean99 asked this question in Q&A
Discussion options

You must be logged in to vote

The metrics are exposed in the Prometheus format. That means that the data is stored in the Prometheus instance that polls the system, not in the system being monitored.

If the system fails catastrophically anyways, there's no means to guarantee that a "graceful shutdown" would happen, saving some logs or metrics in a persistent storage. That's why usually you want to store the data in a system that's different than the one you want to monitor (here, the other system storing data is "the prometheus instance"), and why information around system downtime is always "best effort".

And runtime, the system being monitored only stores the current values of the metrics, not the timeseries, so the…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@emschwartz
Comment options

Answer selected by emschwartz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants