Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counter, UpDownCounter, and Gauge instruments compared #156

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
356 changes: 356 additions & 0 deletions text/metrics/0156-counter-gauge-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,356 @@
# Counter, UpDownCounter, and Gauge instruments explained

Counter and Gauge instruments are different in the ways they convey
meaning, and they are interpreted in different ways. Attributes
applied to metric events enable further interpretation. Because of
their semantics, the interpretive outcome of adding an attribute for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, adding an attribute to what?

There are some words here that are a bit hard to understand: "convey meaning", "interpretive outcome".

Do you mean something like this?

The meaning of adding an attribute of a Counter instrument to (something) is not the same as adding an attribute of a Gauge instrument to (something).

Counter and Gauge instruments is different.

With Counter instruments, a new attribute can be introduced with
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, introduced to what?

Who does the attribute belong to? The counter instrument?

Do you mean something like this?

A new counter attribute can be introduced with additional measurements to subdivide a variable count.

additional measurements to subdivide a variable count.

With Gauge instruments, a new attribute can be introduced with
additional measurements to make multiple observations of a variable.

The OpenTelemetry Metrics API introduces a new kind of instrument, the
UpDownCounter, that behaves like a Counter, meaning that attributes
subdivide the variable being counted, but their primary interpretation
is like that of a Gauge.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I'm thinking, "ok, that's nice, but why should I care?"

I would suggest starting this doc at the very top by briefly illustrating the problem that exists when you only have Counter and Gauge instruments, to help the reader see that things aren't fine the way they are, and motivate them to understand the problem and care about this solution.

Actually, you might just start with the use-case that you wrote up for me in our slack chat about this.

The user journey I'm after is this:
1. As a user I know nothing about metrics named random_oss_software.* but they were produced with OTel SDKs.
2. I click "Create a dashboard"
3. I enter the "random_oss.software.*" pattern
4. I get a good dashboard.

and then show at least one scenario (doesn't have to be exhaustive) where this is not possible to do without the UpDownCounter instrument.

Then you could move into the rest of the doc as written.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit hard to understand.

The OpenTelemetry Metrics API introduces a new kind of instrument, the UpDownCounter, that behaves like a Counter,

Perfect, so far so good 👍

meaning that attributes subdivide the variable being counted,

It starts getting a bit confusing here. Who do these attributes belong to? I assume they belong to the UpDownCounter instrument. The behaves like a Counter, meaning that attributes subdivide the variable being counted tells me that the Counter attributes subdivide the variable being counted, and since UpDownCounter behaves like a Counter, then I understand that UpDownCounter has attributes that also subdivide the variable being counted.

but their primary interpretation is like that of a Gauge.

Ok, here it gets confusing. the usage of "but" here makes me understand that the difference between Counter and UpDownCounter is that the primary interpretation of the latter is like that of a Gauge. What does that mean? The difference between the words Counter and UpDownCounter is "UpDown" so, is the "UpDown" what a Gauge does? 🤷 I was expecting something like this instead but the UpDownCounter is non monotonic.


## Background

OpenTelemetry has a founding principal that the interface (API) should
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: 'principal' => 'principle'

be decoupled from the implementation (SDK), thus the Metrics project
set out to define the meaning of metrics API events.

OpenTelemetry uses the term _temporality_ to describe how Sum
aggregations are accumulated across time, whether they are reset to
zero with each interval (_delta_) or accumulated over a sequence of
intervals (_cumulative_). Both forms of temporality are considered
important, as they offer a useful tradeoff between cost and
reliability. The data model specifies that a change of temporality
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temporality of a sum aggregation can change? I mean, can it be delta now, cumulative later, then delta again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're talking version of a service, possibly. I could code service A to use DELTAs, then decide to switch to cumulatives in version 2.

Practically you shouldn't expect a service to be going back and forth between delta + cumulative points.

does not change meaning.

OpenTelemetry recognizes both synchronous and asynchronous APIs are
useful for reporting metrics, and each has unique advantages. When
used with Counter and UpDownCounter instruments, there is an assumed
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is being used with counter and updowncounter instruments?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read that line to mean that the terms synchronous and asynchronous when used with Counter and UpDownCounter. As in synchronous Counter vs asynchronous Counter etc. (SumObserver as I read on Lightstep docs, also see the comparison table)

relationship between the aggregation temporality and the choice of
synchronous or asynchronous API. Inputs to synchronous
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the API synchronous or asynchronous? Would it be more correct to say instruments are synchronous or asynchronous?

(UpDown)Counter instruments are the changes of a Sum aggregation
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I find the phrase changes of a Sum aggregation difficult to mentally parse. What about adding the word incremental to more explicitly distinguish it from the behavior of an asynchronous counter, e.g.

Inputs to synchronous (UpDown)Counter instruments are the incremental changes to a Sum aggregation (i.e., deltas)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

I would suggest that we avoid "incremental" since it might suggest monotonicity (or we will have to put something like "incremental/decremental").
My suggestion would be "Inputs to synchronous (UpDown)Counter instruments are the deltas, while inputs to asynchronous instruments are always the absolute value (e.g. the total sum)."

(i.e., deltas). Inputs to asynchronous (UpDown)Counter instruments
are the totals of a Sum aggregation (i.e., cumulatives).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to explain (or link to an explanation of) why synchronous Counters are assumed to be deltas vs why async Counters are assumed to be cumulative, to help the reader follow the reasoning of why those things are true (or common).


## Glossary

_Meaning_: Metrics API events have a semantic definition that dictates
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a bit easier to understand if the glossary also had an entry for Metrics API events.

the meaning of the event, in particular how to interpret the integer
or floating point number value passed to the API.

_Interpretation_: How we extract information from metrics data using
the semantics of the API and the semantics of the OTLP data points.

_Metric instrument_ is a named instrument, belonging to an
instrumentation library, declared with one of the OpenTelemtetry
Metrics API instruments. For the purpose of this text, it is a
Counter, an UpDownCounter, or a Gauge.

_Metric attributes_ can be applied to Metric API events, which allows
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This describes what can be done with metric attributes (can be applied to metric API events), but it does not actually define what metric attributes are.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine attributes are specific to each metric so the spec wouldn't define what the attributes are. This is what is referred to as labels in the ASCII art in the spec but the agreed-upon name going forward is attributes.

interpreting the meaning of events using different subsets of
attribute dimensions.

_Metric data stream_ is a collection of data points, written by a
writer, having an identity that consists of the instrument's name, the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is written by a writer redundant?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...having an identity that consists of the instrument's name

What instrument is this? How is this instrument related to the metric data stream?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who has an identity that consists of the instrument's name, the instrumentation library, resource attributes, and metric attributes? The Metric data stream or the data points?

instrumentation library, resource attributes, and metric attributes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to help define an interface/standard or simply saying that a data stream encompasses all of the data and the source?


_Metric data points_ are the items in a stream, each has a point
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each point has a point kind sounds a bit redundant. I think it would be better to say each point has a kind.

kind. For the purpose of this text, the point kind is Sum or Gauge.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "for the purpose of this text" means here? Are the point kinds going to be different for the purpose of other texts?

Sum points have two options: Temporality and Monotonicity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporality is previously defined, but not Monotonicity.


_Metric timeseries_ is the output of aggregating a stream of data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hard to understand without having previous understanding of what aggregating a stream of data is. I think this definition needs another statement. For example, let's define what is an arithmetic sum:

sum: is the result of adding two summands.

This can be made more clear by adding a direct description of what a sum is:

sum: Is a number, the result of adding two summands.

Maybe this definition can begin like this?

Metric timeseries is a sequence of ... that results from aggregating a stream of data points...

points for a specific set of resource and attribute dimensions.

## Meaning and interpretation of Counter and UpDownCounter events
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is what you're going for, but if your main goal is to explain the need for UpDownCounter, it might make sense to omit it from this section and wait to present it later as a solution to the problem described in this doc. If that's the goal, it feels a bit premature to describe it here, because I as a reader don't yet know why it's necessary or useful and I get a bit lost.


Counter and UpDownCounter instruments produce Sum metric data
points that are taken to have meaning in a metric stream, independent
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that are taken to have meaning in a metric stream mean?

of the aggregation temporality, as follows:

- Sum points are quantities that define a rate of change with respect to time
- Rate-of-change over time combined with a reset time may be used to derive a current total.

The rate interpretation is preferred for monotonic Sum points, and the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 2 previous points are interpretations, right?

If that is the case, referring to one as "the rate interpretation" is a confusing because both points use the word rate. Better to number these interpretations and refer to them to the first or second one.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The rate interpretation is preferred for monotonic Sum points, and the
The rate interpretation is preferred for monotonic Sum points, and

the current total interpretation is preferred for non-monotonic Sum
points. Both interpretations are meaningful and useful for both kinds
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit confused here, first this says that one interpretation is preferred for a certain kind of sum point and the other one is preferred for the other kind of sum point. Then it says that both are "meaningful and useful" for both kinds of sum point. Then, why is one preferred over the other? 🤷

of Sum point.

Sum points imply a linear scale of measurement. A Sum value that is
twice the amount of another actually means twice as much of the
variable was counted. Linear interpolation is considered to preserve
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean with "linear interpolation is considered to preserve meaning"? Do you mean that it is possible to interpolate linearly between two sum points?

meaning.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider adding a concrete example of a real life metric to help the reader follow the precise but very technical description above (maybe even a diagram, if it lends itself to that). There are some examples I've seen in other places that you could probably just steal, e.g. https://lightstep.com/blog/opentelemetry-101-what-are-metrics/

I'm thinking that would help people like me who learn more readily through examples. Same below with Gauge events.


## Meaning and interpretation of Gauge events

Gauge instruments produce Gauge metric data points are taken to
have meaning in a metric stream as follows:

- Gauge point values are individual measurements captured at an instant in time
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't sum points also individual measurements captured at an instant in time?

- Gauge points record the last known value in a series of individual measurements.

Note that these two statements imply different interpretation for
synchronous and asynchronous measurements. When recording Gauge
values through a synchronous API, the interpretation is "last known
value", and when recording Gauge values through an asynchronous API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I buy this. In a synchronous Gauge, recording a value is "Here's the current value".

The difference in Async vs. Sync is whether you can have the "current value" or "last known value" be sampled on-demand.

i.e. I think the terminology here is being phrased from an SDK/exporter perspective vs. the instrument's perspective (which I assume is closer to a user of metrics API)

the interpretation is "current value".
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, "last known value" and "current value" are very similar. It is hard to understand the difference between synchronous gauge and asynchronous gauge using this concept.


The distinction between last known value (synchronous) and current
value (asynchronous) is considered not significant in the data model.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but it is significant for the reader of this document to understand the difference between synchronous gauge and asynchronous gauge...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're implicitly saying that when you export metrics, you're setting the Timestamp for a synchronous Gauge to "export time", whereas the actual timestamp was when the synchronous instrument sent the last piece of data.

E.g. Reproting INterval

|       X       X           |
t0     t1      t2           t3

Here, the synchronous gauge is reporting its value at t1 and t2. For the collection interval t0 -> t3, we report the value recorded at t2 BUT you're saying for DELTA + CUMULATIVE it would report its timestamp at t3, whereas an asynchronous instrument would just capture a value at t3.

I think this needs some kind of picture.

Contrasting with Sum points, less can be assumed about the
measurements. No implied linear scale of measurement, therefore:

- Rate-of-change may not be well defined
- Ratios are not necessarily meaningful

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These statements feel too strong to me. I can think of instances where gauges are very useful in context of rate, ratios, and trend modeling. However, I think it's important to say that Gauges are signed with their own time stamp. Valuable analysis likely needs some data alignment (aggregating by some method to regular time intervals) in order to make meaningful comparisons

- Linear interpolation is not necessarily supported.

## Attributes are used for interpretation
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section header doesn't quite seem to flow naturally with the text, maybe call this section something
like "Adding attributes to events for additional meaning" to help tie it to previous and subsequent sections.


Metric attributes enable new ways to interpret a stream of metric
data. Metric attributes add information without changing the value of
a metric event. Addition and removal of metric attributes can be
accomplished safely by applying transformations that preserve meaning.

Addition of attributes on a metric event can create new timeseries, by
producting a of greater number of distinct attribute sets. However,
the meaning in the original events is preserved in the complete set of
timeseries.

Removing attributes from metric streams without changing meaning
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest moving this down below the example, and adding a sentence afterward to draw attention to the problem, e.g.

"...which means applying the natural aggregation function to merge metric streams. But how do we know which is the "natural" aggregation function? For Counters the answer is always SUM (for: reason), however for Gauges it might be SUM or MEAN depending on the semantics of the values represented by the particular metric. (for: reason). Therein lies the problem."

requires re-aggregation, in general, which means applying the natural
aggregation function to merge metric streams.

For example, any metric event with no attributes:

```
gauge.Set(value)
```

can be extended by a new attribute, without changing its meaning or
altering any existing interpretation:

```
gauge.Set(value, { 'property': this.property })
```

## New measurements: Counter and UpDownCounter instruments

Sum points have been defined to have linear scale of measurement,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some trouble deciphering the phrase "linear scale of measurement". I understand the concept, after reading the rest of the paragraph, but I wonder if there's a different way to say this, maybe by adding an explanatory aside, something like this (although it probably doesn't use the right terminology)

"Sum points have been defined to have linear scale of measurement, meaning the same Sum point value could be obtained through many different combinations of metric event values. This property can also be applied in reverse, meaning that Sum points can be subdivided."

therefore Sum points can be subdivided. A single Counter event can be
logically replaced by multiple Counter events having an equal sum.
This property allows the producer of metric events to introduce new
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe: "introduce new measurements" => "introduce new attributes that subdivide measurements"? (maybe that's the wrong terminology though)

measurements, while preserving existing interpretation.

For example, it is reasonable to replace a single Counter event adding `x+y`:

```
counter.Add(x+y)
```

with separate counter events and one additional attribute:

```
counter.Add(x, { 'property': 'X' })
counter.Add(y, { 'property': 'Y' })
```

This property for Sum points makes it possible to configure an
instrumentation library with or without subdivided Sums and to
meaningfully aggregate data with a mixture of attributes.

## New measurements: Gauge instruments
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This section is super clear and easy to understand.

nit: "New measurements" => "Adding new measurements" for clarity? Same below.


Gauge instruments, unlike Counter instruments, cannot be subdivided.
Multiple Gauge measurements cannot be meaningfully combined using
addition. In the time dimension, Gauge instrument events are
aggregated by taking the last value.

The same aggregation can be applied when removing an attributes from
metric streams forces reaggregation. The most current value should be
selected. In case of identical timestamps, a random value should be

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random value sample for time collisions is an important decision that should be clearly documented. There's a lot of behavior that surfaces when this happens that is difficult to understand. I do think it's a good solution, just needs to be clearly documented.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: removing an attributes => either "removing an attribute" or "removing attributes"

selected to preserve the meaning of the Gauge.

For example, a Gauge for expressing a vehicle's speed relative to the
ground can be expressed either as the speed of its midpoint or by an
independent measurement of the speed of each wheel.

```
speedGauge.Set(vehicleSpeed)
```

This can be replaced by one Gauge per wheel, since wheel speed and
vehicle speed each define vehicle speed relative to the ground:

```
for i := 0; i < 4; i++ {
speedGauge.Set(wheelSpeed[i], { 'wheel': i })
}
```

This form of Gauge rewrite is generally useful to capture additional
measurements by creating distinct metric streams.

## Meaning-preserving attribute erasure

Several rules for rewriting metric events that preserve meaning have
been shown above, focused on introducing new attributes and new
measurements in ways do not change existing meaning or alter existing
interpretations.

Removing attributes from metric events does not, by definition, change
their meaning, since attributes are interpreted as event selectors.
Removing attributes from aggregated streams of OpenTelemetry Metrics
data requires attention to the meaning being conveyed.

Safe attribute erasure for OpenTelemetry Metrics streams is specified
in a way that preserves meaning while removing only the forms of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if "removing the forms of interpretation that made use of the erased attribute" basically is destroying the practical value of the metric?

I think this is implying that removing an attribute from a metric is an ok thing to do (it' s not). The meaning that IS preserved is relevant to reaggregation + processing, but could be disastrous to dashboards and users.

interpretation that made use of the erased attribute.

_Reaggregation_ describes the process of combining OpenTelemetry
metric streams. For reaggregation to preserve meaning, Sum points
must be combined by adding the inputs and Gauge points must be
combined by selecting the last or random value.

Note that erasure of attributes is defined so that it reverses the
effect of introducing new measurements, and meaning is preserved in
both directions. This explains the definition for default
aggregations that should be applied when re-aggreation OpenTelemetry
metrics streams. Sum streams are re-aggregated to preserve the
implied rate, while Gauge points are reggregated to preserve the
implied distribution of individual values.

## Conveying meaning to the user

OpenTelemetry states a requirement separating the API from the
implementation, and to do so we have defined the meaning of metrics
API events. To preserve meaning through stages of reaggregation, we
have specified distinct default aggregation rules for Counter and
Gauge streams.

When attributes are used with Counter and Gauge instruments, every
distinct combination of attribute values determines a separate
OpenTelemetry metrics stream, and each stream conveys meaning
independently. Because meaning is independent from the attributes
used, the user may wish to disregard some attributes when interpreting
a stream of metrics, restricting their attention to a subset of
attributes.

In database systems, this process is refered to as a performing a
"Group-By", where aggregation is used to combine streams within each
distinct set of grouped attributes. For the benefit of OpenTelemetry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of falls over for me in practice. Database Group-By is a full query language where NORMALLY you need to provide your own Aggregation semantic for every column of data or the query will fail.

What we're suggesting here is instead of:

SELECT AVG(count) FROM x GROUP BY label

The default aggregation function is infferred by otel based on the metric type.

Two things:

  1. We shouldn't prevent users from using different aggregation functions in practice. Indeed, once they make it to PromQL (or MQL for GCP), they'll be able to do whatever they want here and need to track their own meaning.
    2.I assume a lot of the gymnastics we go through around default aggregation is to avoid exposing a query-like langauge for "rewrite rule" style collection behavior.

My $.02 here is the focus on giving users a way to solve "rewrite rules" use cases is good. Making it as easy as possible is good. If we can't explain what is and isn't safe in very simple terms, we might be in trouble. If the "meaning" we retain isn't the one users wanted to retain, then we're not really adding value.

users, Metrics systems are encouraged to choose a a meaning-preserving
aggregation when grouping metric streams to convey meaning to the
user.

When conveying meaning to the user by grouping and aggregating over a
subset of attribute keys, the default aggregation selected should be
one that preserves meaning. For monotonic Counter instruments, this
means conveying the combined rate of each group. For UpDownCounter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does monotonic counter instruments turn into a rate? Did I miss an above description on how UpDown vs. Counter have different aggregation meaning?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this description?

instruments, this means conveying the combined total of each group.
For Gauge instruments, this means conveying the combined distribution
of each group.

## Choice of UpDownCounter or Gauge

The OpenTelemetry UpDownCounter instrument resembles the Gauge
instrument, but streams generated from these instruments apply
different aggregation rules by default. The choice of instrument
should be made to ensure that the default aggregation rule preserves
meaning, as that is the point of these definitions.

Examining Gauge instruments in existing systems for anecdotal evidence
suggests that a significant majority of Gauges should be written as
UpDownCounters in OpenTelemetry. Examples are given below.

### UpDownCounter measurements

UpDownCounter instruments are used for capturing quantities, where
typical examples include:

- Queue size
- Memory size
- Cache size
- Active requests
- Live object count

To test that these quantities are suitable UpDownCounter measurements,
verify that adding two inputs together logically produces another of
the same type and scale of measurement. A queue size plus a queue
size yields a queue size, for example; add one count of live objects
with another, and you have a count of live objects. By choosing the
UpDownCounter, developers ensure that the meaning conveyed is a sum,
which ensures the correct rate interpretation.

When interpreting total sums aggregated from UpDownCounter
instruments, it is important to consider the set of contributing
attributes, which determine the scale of measurement. If one server
outputs UpDownCounter data in two attribute dimensions while another
uses three attribute diensions, the mean value is not a meaningful
quantity. The process of correcting mixed attribute dimensions for
cumulative sums is referred to as _dimensional alignment_.

### Gauge measurements

Gauge instruments are used for capturing physical measurements,
calculated ratios, and results of function evaluation. For example:

- CPU utilization
- CPU temperature
- Fan speed
- Water pressure
- Success/failure ratio

To test that these are suitable Gauge measurements, verify that adding
two inputs together does not logically produce a measurement of the
same type.

A CPU utilization plus a CPU utilization cannot meaningfully be used
as a measure of CPU utilization, it is just the sum of two CPU
utilizations.

A fan speed plus a fan speed has the correct units (a fan speed), but
the result is not a meaningful quantity. Two fans spinning at one
speed is not the same as one fan spinning at twice the speed.

In some of these cases, it may be logical but practically impossible
to use one or more Counter instruments in place of Gauges. CPU
utilization can be derived from a usage Counter. Fan speed can be
derived from a revolution Counter.

## Summary

The OpenTelemetry Metrics data model supports addition and removal of
attributes in a way that preserves meaning. This design gives
developers the ability to introduce new attributes in a safe way.

OpenTelemetry metrics developers are asked to consider whether they
want an UpDownCounter or Gauge when making asynchronous measurements,
and they should make this decision based on whether the default
aggregation rule for UpDownCounter or Gauge preserves meaning. This
decision comes down to whether attributes are meant to subdivide a Sum
point or qualify a Gauge point.

The default aggregation rules for OpenTelemetry metrics data points
ensure that meaning is preserved when removing attributes from a
stream of metrics data. The rules for reaggregation specify that
attributes should be safely removed before aggregating with other
metrics that are missing the same attributes, a process referred to as
dimensional alignment.

This design allows optional attributes to be included by the SDK in
metric data when it is available, such as those extracted from
TraceContext Baggage, in ways that consumers of the metrics data can
interpret correctly.

Having the ability to automatically remove attributes without changing
the meaning of Counter, UpDownCounter, and Gauge metrics API events
makes it possible for OpenTelemetry collectors to be configured with
re-aggregation rules, which can be managed by users in order to limit
collection costs.