Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review design: synchronous sampling #45

Open
adinapoli opened this issue Sep 30, 2022 · 0 comments
Open

Review design: synchronous sampling #45

adinapoli opened this issue Sep 30, 2022 · 0 comments

Comments

@adinapoli
Copy link
Contributor

For historical reasons, ridley allows metrics to be sampled independently from when the relevant prometheus endpoint is hit; this decoupling is convenient for certain aspects (like avoid too-frequent sampling of expensive metrics), but it's not what the Prometheus' best practices advocate, e.g.:

Metrics should only be pulled from the application when Prometheus scrapes them, exporters should not perform scrapes based on their own timers. That is, all scrapes should be synchronous.

Another downside is that clients pulling the metrics (like Grafana) wouldn't necessarily have the most up-to-date value for those metrics, if the "internal" sampling time didn't fire yet. Furthermore, Grafana's default scraping interval is 60s, so if anything the default sampling time of 5s that ridley uses is way too aggressive.

Considering all of that, it would be nice for ridley to comply to the Prometheus' guidelines on writing exporters, with a particular attention on scheduling.

This might require some reworking of ridley's internals. It has been a while since Ridley's inception, but essentially ridley uses
the Haskell prometheus library, which exposes a bunch of functions for scraping, most of which allow passing an IO action that will perform the sampling for each new incoming request.

Currently ridley uses serveMetrics and sample as this IO action, which as far as I can see is not not really synchronous: yes, it reads the values for counters and gauges synchronously, but the library expects library users to keep those counters up-to-date "out-of-band", as per README:

module Example where

import           Control.Monad.IO.Class                         (liftIO)
import           System.Metrics.Prometheus.Http.Scrape          (serveMetricsT)
import           System.Metrics.Prometheus.Concurrent.RegistryT
import           System.Metrics.Prometheus.Metric.Counter       (inc)
import           System.Metrics.Prometheus.MetricId

main :: IO ()
main = runRegistryT $ do
    -- Labels can be defined as lists or added to an empty label set
    connectSuccessGauge <- registerGauge "example_connections" (fromList [("login", "success")])
    connectFailureGauge <- registerGauge "example_connections" (addLabel "login" "failure" mempty)
    connectCounter <- registerCounter "example_connection_total" mempty
    latencyHistogram <- registerHistogram "example_round_trip_latency_ms" mempty [10, 20..100]

    liftIO $ inc connectCounter -- increment a counter

    -- [...] pass metric handles to the rest of the app

    serveMetricsT 8080 ["metrics"] -- http://localhost:8080/metric server

As you can see the advocated way seems to be the one of passing those counter "handlers" around the app, which should take care of updating those, and then prometheus 's sample just takes a snapshot of these values.

I guess that in order to write a compliant exporter we should write our own IO RegistrySample version that calls each updateMetricHandler for a new HTTP request, rather than relying on "just reading the counters and gauges".

How all the HELP story and caching relates to this I'm not sure yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant