Skip to content

Commit

Permalink
Strengthen requirements for aggregators
Browse files Browse the repository at this point in the history
  • Loading branch information
jmacd committed Nov 13, 2019
1 parent b04e927 commit 401292b
Showing 1 changed file with 102 additions and 29 deletions.
131 changes: 102 additions & 29 deletions specification/sdk-metric.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,12 @@ system-imposed requirements. A _dimensionality reduction_ maps input
LabelSets with (potentially) a large number of labels into a smaller
LabelSet containing only labels for an explicit set of label keys.
Performing dimensionality reduction in an metrics export pipeline
generally means merging Aggregators computed for original LabelSets
into a single combined Aggregator for the reduced-dimension LabelSet.
generally means merging aggregators computed for original LabelSets
into a single combined aggregator for the reduced-dimension LabelSet.

__Export record__: The _Export record_ is an exporter-independent
in-memory representation combining the metric instrument, the LabelSet
for export, and the associated (checkpointed) Aggregator containing
for export, and the associated (checkpointed) aggregator containing
its state. Metric instruments are described by a metric descriptor.

__Metric descriptor__: A _metric descriptor_ is an in-memory
Expand Down Expand Up @@ -67,7 +67,7 @@ collection. Batcher and Exporter implementations are written with the
assumption that collection is single-threaded, therefore the Meter
implementation MUST prevent concurrent `Collect()` calls. During the
collection pass, the Meter implementation checkpoints each active
Aggregator and passes it to the Batcher for processing.
aggregator and passes it to the Batcher for processing.

This document does not specify how to coordinate synchronization
between user-facing metric updates and metric collection activity,
Expand All @@ -80,7 +80,7 @@ updates.

The Meter acts as a short-term store for aggregating metric updates
within a collection period. The Meter implementation maintains
Aggregators for active metric instruments according to the complete,
aggregators for active metric instruments according to the complete,
original LabelSet. This ensures a relatively simple code path for
entering metric updates into the Meter implementation.

Expand Down Expand Up @@ -116,7 +116,7 @@ every record either: (1) has a current, un-released handle pinning it
in memory, (2) has pending updates that have not been collected, (3)
is a candidate for removing from memory. The Meter maintains a
mapping from the pair (Instrument, LabelSet) to an active record.
Each active record contains an Aggregator implementation, which is
Each active record contains an aggregator implementation, which is
responsible for incorporating a series of metric updates into the
current state.

Expand All @@ -128,51 +128,124 @@ active records.
## Aggregator implementations

The Aggregator interface supports combining multiple metric events
into a single aggregated state. Different concrete Aggregator types
into a single aggregated state. Different concrete aggregator types
provide different functionality and levels of concurrent performance.

Aggregators support `Update()`, `Checkpoint()`, and `Merge()`.
`Update()` is called directly from the Meter in response to a metric
event, and may be called concurrently. `Update()` is also passed the
user's telemetry context, which allows is to access the current trace
context and distributed correlations, honwever none of the built-in
context and distributed correlations, however none of the built-in
aggregators use this information.

The `Checkpoint()` operation is called to atomically save a snapshot
of the Aggregator, since `Checkpoint()` may be called concurrently
with `Update()`. The `Merge()` operation supports dimensionality
reduction by combining state from multiple Aggregators into a single
reduction by combining state from multiple aggregators into a single
Aggregator state.

The Metric SDK comes with six built-in Aggregator types, two of which
are standard for use with counters and gauges.

1. Counter: This aggregator maintains a Sum using only a single word of memory.
1. Gauge: This aggregator maintains a pair containing the last value and its timestamp.

Four aggregators are intended for use with [Measure metric instruments](api-metrics.md#measure).

1. MinMaxSumCount: This aggregator computes the min, max, sum, and count using only four words of memory.
1. Sketch: This aggregator computes an approximate data structure that can estimate quantiles. Example algorithms include GK-Sketch, Q-Digest, T-Digest, DDSketch, and HDR-Histogram. The choice of algorithm should be made based on available libraries in each language.
1. Histogram: This aggregator computes a histogram with pre-determined boundaries. This may be used to estimate quantiles, but is generally intended for cases where a histogram will be exported directly.
1. Exact: This aggregator computes an array of all values, supporting exact quantile computations in the exporter.
The Metric SDK SHOULD include six built-in aggregator types. Two
standard aggregators MUST be included that implement standard counter
and gauge aggregations.

1. Counter: This aggregator MUST maintain a Sum. In languages with
support for atomic operations, the Counter aggregator SHOULD be
implemented using only a single word of memory for the current state
and a single word of memory for its checkpoint.
1. Gauge: This aggregator MUST maintain the last value and its
timestamp. In languages with support for atomic operations, this
aggregator's update operation SHOULD be implemented by a single memory
allocation--to store the value and timestamp--followed by an atomic
pointer swap; if the gauge is defined as monotonic, the should use
atomic compare-and-swap to ensure monotonicity.

Aggregators for [Measure metric instruments](api-metrics.md#measure)
are more challenging in nature than the Counter and Gauge aggregators,
since their task is to aggregate a series of individual measurements.
To perform this duty exactly requires storing the entire set of
measurements, which may be cost-prohibitive. The common mechanisms
for exporting recorded measurements from a Measure metric instrument
are: as a series of raw measurements, as a summary of pre-determined
quantiles, and as a histogram with pre-determined boundaries. A
definition for _Quantile_ is given below.

Four aggregators SHOULD be provided for use with Measure metric
instruments that support the common mechanisms for exporting recorded
measurements with a range of performance options.

1. MinMaxSumCount: This aggregator is intended as an inexpensive
alternative to the Sketch, Histogram, and Exact aggregators for
Measure instruments. This aggregator MUST compute the min, max, sum,
and count of recorded measurements. In languages with support for
atomic operations, this aggregator's update operation SHOULD maintain
its state using four independent atomic updates. In this case, the
aggregator's update operation SHOULD NOT be atomic with respect to its
checkpoint operation, implying that a checkpoint could witness an
inconsistent state; that is intentional given the inexpensive nature
of this aggregator.
1. Sketch: This aggregator computes an approximate data structure that
MUST estimate quantiles of the distribution of recorded measurements.
Example algorithms that could be used to implement this aggregator are
include: GK-Sketch, DDSketch, Q-Digest, T-Digest, and HDR-Histogram.
The choice of algorithm should be made based on available libraries in
each language, but implementations with well-defined error bounds
SHOULD be preferred.
1. Histogram: This aggregator MUST compute a histogram with
pre-determined boundaries. This aggregator MAY support quantile
estimation, but is generally intended for cases where a histogram will
be exported directly and the exporter wants explicit control over
histogram boundaries.
1. Exact: This aggregator MUST store an array of all recorded
measurements. This aggregator MUST support exact quantile
computations and it MUST support exporting raw values in the order
they were recorded, however it is not required to support both of
these modes simultaneously (since computing quantiles requires sorting
the measurements).

### Quantile definition

When exporting a summary of recorded measurements for a Measure metric
instrument, it is common to report _quantiles_ of the distribution.
When computing quantiles from an exact aggregation (i.e., the complete
data set), the "nearest rank" definition of quantile SHOULD be used.
The nearest-rank definition ensures that the resulting value belongs
to the original data set. Interpolation is not used in this method.

The definition for the nearest-rank quantile given here makes use of
the _cumulative distribution function_, a standard concept from
probability theory. Quantiles are parameterized by `q`, where `0 <= q
<= 1`. The value for quantile `q` is the least element of the
original data set at or above the point where the cumulative
distribution function equals `q`.

For example, taking a data set of five values `{10, 20, 30, 40, 50}`,
the `q=0.5` quantile (i.e., the median) equals 30, which is precisely
the point where the cumulative distribution function equals 0.5.

With an even-sized data set, for example `{10, 20, 30, 40}`, the
`q=0.5` quantile equals 30. In this case, the cumulative distribution
function equals 0.5 halfway between 20 and 30 and the greater value is
selected as the nearest rank.

When using an approximate aggregator to compute estimated quantile
values, the nearest-rank quantile definition does not apply.

## Batcher implementation

The Batcher acts as the primary source of configuration for exporting
metrics from the SDK. The two kinds of configuration are:

1. Given a metric instrument, choose which concrete Aggregator type to apply for in-process aggregation.
1. Given a metric instrument, choose which concrete aggregator type to apply for in-process aggregation.
1. Given a metric instrument, choose which dimensions to export by (i.e., the "grouping" function).

The first choice--which concrete Aggregator type to apply--is made
The first choice--which concrete aggregator type to apply--is made
whenever the Meter implementation encounters a new (Instrument,
LabelSet) pair. Each concrete type of Aggregator will perform a
LabelSet) pair. Each concrete type of aggregator will perform a
different function. Aggregators for counter and gauge instruments are
relatively straightforward, but many concrete Aggregators are possible
relatively straightforward, but many concrete aggregators are possible
for measure metric instruments. The Batcher has an opportunity to
disable instruments at this point simply by returning a `nil`
Aggregator.
aggregator.

The second choice--which dimensions to export by--affects how the
batcher processes records emitted by the Meter implementation during
Expand All @@ -181,7 +254,7 @@ export record for each metric instrument with pending updates to the
Batcher.

During the collection pass, the Batcher receives a full set of
checkpointed Aggregators corresponding to each (Instrument, LabelSet)
checkpointed aggregators corresponding to each (Instrument, LabelSet)
pair with an active record managed by the Meter implementation.
According to its own configuration, the Batcher at this point
determines which dimensions to aggregate for export; it computes a
Expand Down Expand Up @@ -236,7 +309,7 @@ The metric export pipeline specified here does not include explicit
support for multiple export pipelines. In principle, any one of the
interfaces here could be satisfied by a multiplexing implementation,
but in practice, it will be costly to run multiple Batchers or
Aggregators in parallel.
aggregators in parallel.

If multiple exporters are required, therefore, it is best if they can
share a single Batcher configuration.
Expand All @@ -245,7 +318,7 @@ share a single Batcher configuration.

The Meter implementation and some Batcher implementations are required
to compute a unique key corresponding to a LabelSet, for the purposes
of locating an Aggregator to use for metric updates. Where possible,
of locating an aggregator to use for metric updates. Where possible,
Exporters can avoid a duplicate computation by providing a
LabelEncoder to the Meter implementation.

Expand Down

0 comments on commit 401292b

Please sign in to comment.