-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to unregister/stop/destroy instruments #2232
Comments
(Following https:/open-telemetry/opentelemetry-specification/blob/main/CONTRIBUTING.md#issue-triaging process). @jsuereth I am reassigning this to you, I believe you know the context of this better. |
@mateuszrzeszutek micrometer is an API + Some implementation(s). I am not sure why that should translate into calls to our API instead of offering an implementation of micrometer that implements the /cc @jsuereth |
@bogdandrutu Micrometer was just one example but do you think apps should not have a way to stop reporting metrics? Dropwizard metrics in Java also has |
There are advantages to calling OTel API in micrometer bridge instrumentation:
Also, micrometer aside, the unregister/remove functionality is still very much needed. The database connection pool example from my first post here is still valid. |
Just another note about Micrometer, I expect it to be used for a very long time by Spring and likely Spring users too as a result. For example, for tracing they will use Micrometer Tracing, not the OpenTelemetry tracing API. I think the reality of the Java ecosystem means we should consider Micrometer as a first class API for Java metrics, and this seems OK to me in practice. As such, I'd like Micrometer usage to benefit from features like exemplars / baggage which means going through our API (or well our SDK implementations directly maybe, but not |
It is just a representation difference, it is still a Histogram, I don't see the point.
The micrometer instrumentation is not really an instrumentation, is actually an SDK/Producer whatever you call it, since micrometer is an API + Impl to produce metrics. Not sure if any metrics library should be seen as "instrumentation", but rather as producers of telemetry.
I am not saying to not have an integration (not instrumentation) with micrometer.
It is very hard to do that since their API is not well designed for that (or was not designed with that in mind). FYI: We don't have enough tracing APIs let's create another one :))) |
Well that's true, conceptually it's the same. It results in different
Doesn't that sort of work by default? If you're using a |
I would like to re-phrase this feature request: Can we allow deleting or unregistering Callbacks? For discussion in the Tuesday Jan 25 8AM PT Spec SIG. |
There is a connected topic discussed in https:/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#asynchronous-example-attribute-removal-in-a-view The question is how, if at all, should an SDK know to stop reporting a metric? |
Just wanted to add an example that it's fairly easy to introduce memory leaks into apps without the ability to unregister callbacks. For example if there is a connection pool that is being decomissioned, possibly due to a database resharding, then it should be able to be garbage collected. But with any natural registration (no complication of weak references) of an async instrumentation, without the ability to remove the callback from OpenTelemetry this pool can never be collected. |
To back up @anuraaga's comment, while it's not as popular, old-school Java EE servers (like tomcat) or anything that uses hot / dynamic loading of plugins need the ability to clean up their memory and usage so Java can evict their bytecode and RAM. For @bogdandrutu I want to directly address the notion that Micrometer should use MetricProducer as is done for OpenCensus. The only reason we use MetricProducer bridge for OpenCensus is because the APi is so divergent from OpenTelemetry that we are unable to do a direct API bridge. If we could have done direct API, we'd have preferred that one. However because OpenCensus metric model is close enough to OTel's, we're not loosing too much with the MetricProducer bridge. I think @mateuszrzeszutek raises some great points around Metric bridges, and I want to call out a few of them:
An API <-> API bridge provides the easiest path forward for instrumentation authors to accomplish that goal. |
@jsuereth Nice to hear all these great things, but I want to also ensure that we are not adding unnecessary complexity to our APIs to support this, hence the "very divergent" part that you mention. The proposal says if this API has a bad thing we should support it because we prefer this type of integration.
A "MetricProducer bridge" will give that to the micrometer users.
A "MetricProducer bridge" will give that to the micrometer users. Resource can still be attached to the bridge instance.
Not 100% convinced that we cannot offer this, but I think we can with a bit of work. @jsuereth it is very important to distinguish between a "instrumentation user friendly API" which is design to be used in the application instrumentation vs an "sink" API that you are doing when wrapping this API. I don't want us to become a "sink" API, if that is the goal then let's have a sink API design and not try to change the current API designed for users to instrument application to fit that goal. Another argument is that micrometer is not an API :) sorry but it has way too much in that artifact to be considered an API. |
@jsuereth also I do agree that they are good reasons to support unregister/close/etc for an instrument, but I don't think the reason should be that another API has it. |
@jsuereth @mateuszrzeszutek is #2317 resolving this? I don't have that feeling... |
To clarify my intention, I believe #2317 narrowly solves this problem stating the SDK SHOULD give applications a way to create callbacks that support being unregistered, which allows a user to stop certain asynchronous measurements from reporting without shutting down an entire The Prometheus ecosystem uses a As the description states, users can simply stop using synchronous instruments if they want to stop reporting measurements; this doesn't seem like a problem we need to help the user with, but if we did, I would suggest that stop/close/destroy simply invalidates the instrument instance so that it can no longer be used to capture measurements. I suggest we file a separate issue or issues to discuss the following points: Need for Delete() verb?The Need for exporter memory preferences?In a push-based exporter, whether using cumulative or delta temporality, there's a question of whether to push a value that has not changed. Since we expect cumulative data to be pushed into a Prometheus ecosystem, it seems wise to default to pushing all values, even unchanged ones. In an exporter with a preference for deltas, it's typical to avoid reporting any delta where the delta (sum) or count (histogram) are zero. For synchronous instruments, this is something the SDK can facilitate by detecting stale streams and, after a while, forgetting them. SHOULD the SDK be required to support forgetting streams? For asynchronous instruments, if the user simply stops reporting values they will stop reporting, unless we require the SDK to detect staleness. Should the Prometheus exporter be responsible for this on its own, or should the SDK facilitate an exporter preference for "synchronous instrument memory"? This topic is connected with #2132 because if the SDK is required to detect staleness, then no additional memory is required to perform cumulative-to-delta translation. @mateuszrzeszutek please signal whether simply the option to unregister callbacks is enough, coupled with memory options for exporters? Or are you looking for something like Prometheus' |
I believe that should be enough. Removing all callbacks of an async instrument effectively functions like Shall I close this issue? We can open another one just for the exporter memory feature. |
Greetings, When code can be loaded and unloaded in a dynamic plugin (for example, dlopen() in C/C++), the application needs a way to unregister asynchronous instruments on unload, to avoid having the SDK call stale callbacks. Without this, asynchronous metrics seem impossible to use with dynamic load / unload. Regards. |
Each asynchronous instrument already requires support for an |
Thanks @jack-berg for the clarifications. The spec wording is (emphasis mine):
In my understanding:
Currently the opentelemetry-cpp API does the later. Regards |
@open-telemetry/specs-approvers Some update on this issue, about metrics. opentelemetry-cpp now provide a way to remove a callback for an async instrument. While this resolves the immediate concern, crashes that happened previously when a measurement is done invoking a defunct callback function (located in a shared library that was unloaded), there are remaining issues. First, the Meter still has definition for instruments, sync and async, which remain while not measured. The problem here is that exporters still export data for these defunct meters / metrics, as tested with the OTLP HTTP exporter. (Edit 2023-06-12: To analyse in detail) It seems the opentelemetry specification lacks the following:
A spec is needed for languages (like opentelemetry-cpp) to implement it. This is a major concern, as there is no way in practice to use Metrics for OpenTelemetry in an environment where libraries are loaded and unloaded dynamically, when these libraries provide their own instrumentation. |
I'd like to add to this issue in favor of having a cleanup mechanism in the spec: We perform monitoring of IoT devices which can be added/removed from our systems at any time by operators and would also benefit from being able to stop reporting metrics when those IoT devices are being decommissioned. Currently, the .NET implementation of asynchronous counter/gauges supports removing metric points from a specific metric, but this causes the last value emitted for that metric point to be re-exported ad-infinitum (or just become a stale entry in the metric point list when using delta export mode) by the OTEL SDK implementation. Another thing that would be interesting to specify is stale markers when a time series is known to be "finished". e.g. Prometheus Remote Write also defines stale markers that can tell Prometheus that a time series is known to be stale and immediately marks it as stale so that it no longer shows up. I think it would be nice to have a specified way to do something similar via OTLP. I would be willing to spend some time working on this if there is interest. Thanks for the hard work on OTEL, by the way! EDIT: I had the wrong link to stale markers in Prometheus |
At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes pulp#4603
At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes pulp#4603
At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes pulp#4603
At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes pulp#4603
At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes #4603
We're elevating this and we have closed #3985 in favour of this issue |
What are you trying to achieve?
I'm currently working on the micrometer->OTel bridge instrumentation in the javaagent. Micrometer offers the possibility to remove a meter from the
MeterRegistry
and stop emitting whatever metrics it used to collect. For example, suppose you use a database connection pool that's instrumented with metrics - when you close/destroy the whole pool you probably want to stop collecting any metrics associated to it (because it doesn't exist anymore). This is useful for both asynchronous instruments (since once they're registered there's no way to stop them) and synchronous instruments (you can just stop using them, but the metrics SDK will still send in the last recorded value).Additional context.
Micrometer bridge PR: open-telemetry/opentelemetry-java-instrumentation#4919
CC @jsuereth
The text was updated successfully, but these errors were encountered: