Review design: synchronous sampling #45

adinapoli · 2022-09-30T05:54:50Z

For historical reasons, ridley allows metrics to be sampled independently from when the relevant prometheus endpoint is hit; this decoupling is convenient for certain aspects (like avoid too-frequent sampling of expensive metrics), but it's not what the Prometheus' best practices advocate, e.g.:

Metrics should only be pulled from the application when Prometheus scrapes them, exporters should not perform scrapes based on their own timers. That is, all scrapes should be synchronous.

Another downside is that clients pulling the metrics (like Grafana) wouldn't necessarily have the most up-to-date value for those metrics, if the "internal" sampling time didn't fire yet. Furthermore, Grafana's default scraping interval is 60s, so if anything the default sampling time of 5s that ridley uses is way too aggressive.

Considering all of that, it would be nice for ridley to comply to the Prometheus' guidelines on writing exporters, with a particular attention on scheduling.

This might require some reworking of ridley's internals. It has been a while since Ridley's inception, but essentially ridley uses
the Haskell prometheus library, which exposes a bunch of functions for scraping, most of which allow passing an IO action that will perform the sampling for each new incoming request.

Currently ridley uses serveMetrics and sample as this IO action, which as far as I can see is not not really synchronous: yes, it reads the values for counters and gauges synchronously, but the library expects library users to keep those counters up-to-date "out-of-band", as per README:

module Example where

import           Control.Monad.IO.Class                         (liftIO)
import           System.Metrics.Prometheus.Http.Scrape          (serveMetricsT)
import           System.Metrics.Prometheus.Concurrent.RegistryT
import           System.Metrics.Prometheus.Metric.Counter       (inc)
import           System.Metrics.Prometheus.MetricId

main :: IO ()
main = runRegistryT $ do
    -- Labels can be defined as lists or added to an empty label set
    connectSuccessGauge <- registerGauge "example_connections" (fromList [("login", "success")])
    connectFailureGauge <- registerGauge "example_connections" (addLabel "login" "failure" mempty)
    connectCounter <- registerCounter "example_connection_total" mempty
    latencyHistogram <- registerHistogram "example_round_trip_latency_ms" mempty [10, 20..100]

    liftIO $ inc connectCounter -- increment a counter

    -- [...] pass metric handles to the rest of the app

    serveMetricsT 8080 ["metrics"] -- http://localhost:8080/metric server

As you can see the advocated way seems to be the one of passing those counter "handlers" around the app, which should take care of updating those, and then prometheus 's sample just takes a snapshot of these values.

I guess that in order to write a compliant exporter we should write our own IO RegistrySample version that calls each updateMetricHandler for a new HTTP request, rather than relying on "just reading the counters and gauges".

How all the HELP story and caching relates to this I'm not sure yet.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review design: synchronous sampling #45

Review design: synchronous sampling #45

adinapoli commented Sep 30, 2022

Review design: synchronous sampling #45

Review design: synchronous sampling #45

Comments

adinapoli commented Sep 30, 2022