Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify how Prometheus uses the OpenMetrics "Created" timestamp #46

Closed
jmacd opened this issue Apr 21, 2021 · 3 comments
Closed

Clarify how Prometheus uses the OpenMetrics "Created" timestamp #46

jmacd opened this issue Apr 21, 2021 · 3 comments

Comments

@jmacd
Copy link

jmacd commented Apr 21, 2021

The OpenMetrics specification states for Counter metrics:

A MetricPoint in a Metric with the type Counter SHOULD have a Timestamp value called Created. This can help ingestors discern between new metrics and long-running ones it did not see before.

A MetricPoint in a Metric's Counter's Total MAY reset to 0. If present, the corresponding Created time MUST also be set to the timestamp of the reset.

The OpenTelemetry data model agrees that this field is useful, and that it should be optional. We have argued that when the Created / Start time is not set, it is possible to miss process restarts, and thus undercount metrics for short-lived processes.

We are trying to define the proper translation into OTLP for metric points when the Created time is not known. This is relevant in https:/lightstep/opentelemetry-prometheus-sidecar, which reads the WAL and writes OTLP metric streams. We believe that a Created / Start time can be filled in by any stateful observer that is able to remember the last value and its timestamp.

When a stateful observer possesses this information, we believe that processor SHOULD fill in the missing start timestamp.

The issue here is investigatory. Does Prometheus have plans to use the OpenMetrics Created timestamp and eventually include that in its WAL?

@jeromeinsf
Copy link

Is the proposal more general to include bitemporal modeling that could be used as hints to time the computing of recording rules when data is updated/late ?

@jmacd
Copy link
Author

jmacd commented Apr 21, 2021

@jeromeinsf If I understand your use of the term correctly, this is probably not the conversation
bitemporal modeling you're looking for.

This "Created" or "Start" timestamp is used to support knowing when a cumulative series was reset.

Your question @jeromeinsf relates to late-arriving data, and this is definitely an important discussion. Right now, especially in this working group, we are focused on a pull-based metrics, and Prometheus uses "staleness markers" to consistently indicate missing data. I see two follow-on questions for push-based systems

  1. For a system pushing OTLP metrics from SDK to Collector, can a stateful processor in the Collector indicate that no data arrived? This is discussed in Metrics semantic convention: "up" metric opentelemetry-specification#1078
  2. For a system re-aggregating OTLP metrics inside a Collector, one that is aware of late-arriving data, can the process correctly update its state and issue new data points that reflect a later understanding of the world? This has not yet been discussed in a dedicated issue, but I'd like to connect this discussion with Clarify the meaning and purpose of external labels #35. Prometheus currently uses external labels to describe the Prometheus process that collects data. When there is a High-Availability configuration, each replica has a distinct value of some spatial dimension that a downstream processor can erase (see Metrics: Requirements for safe attribute removal opentelemetry-specification#1297) to reconstruct a single stream of data. I would like to use external labels to model late-arriving data. In other words, I think we can use external labels to express temporal replication. A stateful processor that is aware of late-arriving data could re-issue an identical data point with a new resource attribute indicating a real or virtual timestamp associated with the update. A downstream processor can correctly compute the state of the world to an observer at a given point in time. (Note I'm ignoring clock synchronization issues!)

@jsuereth
Copy link

This has been clarified and we will account for this in our data model.

@alolita alolita added this to the Data Model and Architecture milestone Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants