Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process.cpu.count metric to semantic conventions for OS process metrics. #651

Open
Yun-Ting opened this issue Feb 9, 2023 · 8 comments
Assignees

Comments

@Yun-Ting
Copy link

Yun-Ting commented Feb 9, 2023

What are you trying to achieve?

Add a metric to expose number of available processors to the current process to
semantic conventions for OS process metrics.

Proposed instrument name, type, unit and description:

Name Instrument Type (*) Unit Description Labels
process.cpu.count UpDownCounter {processors} Number of processors (CPUs) available to the current process.

Currently, the definition of process.cpu.utilization is "Difference in process.cpu.time since the last measurement, divided by the elapsed time and number of CPUs available to the process", which requires maintaining the state (in this case, the last collection time) of the instrument.

The challenge encountered during implementation in .NET is:
open-telemetry/opentelemetry-dotnet-contrib#831

Potential workarounds:
open-telemetry/opentelemetry-dotnet-contrib#948

What did you expect to see?

Add process.cpu.count metric to the semantic conventions and let the backend do the computation.
Given instrument values of process.cpu.time and process.cpu.count, the backend will have sufficient data to calculate the CPU utilization metric.
open-telemetry/opentelemetry-dotnet-contrib#981

Additional context.

Previous discussion related to this topic:
open-telemetry/opentelemetry-specification#2392

@trask
Copy link
Member

trask commented Feb 9, 2023

the definition of process.cpu.utilization is "Difference in process.cpu.time since the last measurement, divided by the elapsed time and number of CPUs available to the process", which requires maintaining the state (in this case, the last collection time) of the instrument.

I think this is a really interesting point.

It seems like process.cpu.utilization is defined as a "delta metric" and may behave strangely in the case of multiple metric readers.

Maybe it's good to remove process.cpu.utilization altogether, in favor of process.cpu.time + process.cpu.count?

@trask
Copy link
Member

trask commented Feb 9, 2023

It seems like process.cpu.utilization is defined as a "delta metric" and may behave strangely in the case of multiple metric readers.

@alanwest made similar open-telemetry/opentelemetry-specification#2439 (comment):

In the event of multiple meter providers, their reporting intervals may be different. So, calculating the difference in process.cpu.time since the last measurement requires the instrumentation to maintain some state per meter provider. I don't think the metric API spec offers the support necessary for this.

@lalitb
Copy link
Member

lalitb commented Feb 10, 2023

It seems like process.cpu.utilization is defined as a "delta metric" and may behave strangely in the case of multiple metric readers.

Agree. It would be instead useful if process.cpu.utilization represents real-time percentage cpu utilization for given process (as provided in %CPU column of top command om posix), as an UpDownCounter. Or maybe have a separate metrics for that. For C++ applications on linux, process.cpu.count would mostly be the total number of cores in the machine, and same value would be recorded each time. Probably it is more useful to .Net Runtime or JVM if they provides more accurately the number of processors/cores allocated to it. Nothing against this PR though :)

@alanwest
Copy link
Member

Maybe it's good to remove process.cpu.utilization altogether, in favor of process.cpu.time + process.cpu.count?

At a minimum, the spec needs to clarify that it is not possible for instrumentation written using OTel's language APIs to generate the process.cpu.utilization (at least as it is currently defined).

Regarding removing it altogether, I'm not sure..

The collector's hostmetrics receiver generates the process.cpu.utilization metric.

It is my understanding that the semantic conventions of the spec are not limited to only describing metrics produced using language APIs. If the collector produces this metric, and its semantics match what is here in the spec, then shouldn't we leave it in the spec?

It would be instead useful if process.cpu.utilization represents real-time percentage cpu utilization for given process (as provided in %CPU column of top command om posix), as an UpDownCounter.

I agree. This sounds like the semantics of the process.runtime.jvm.cpu.utilization metric:

Recent CPU utilization for the process. [2]

[2]: These utilizations are not defined as being for the specific interval since last measurement (unlike system.cpu.utilization).

Unless there is good reason not to, I would prefer to redefine the process.cpu.utilization to be the same as Java's process.runtime.jvm.cpu.utilization. If this is not possible, then perhaps all languages could have a process.runtime.*.cpu.utilization metric.

Regarding process.cpu.count, I'm also not against it, but if it's primary purpose is to enable computing utilization, then I think it would be ideal to just produce it directly.

@arminru
Copy link
Member

arminru commented Feb 13, 2023

From the spec triage meeting: This looks like a reasonable request but the right approach still needs to be decided on, which is already being discussed in this thread.

cc @open-telemetry/instr-wg for your input on this

@Yun-Ting
Copy link
Author

Hi @open-telemetry/instr-wg, may I kindly have your take on this? Thank you.

@dmitryax
Copy link
Member

This should be transferred to https:/open-telemetry/semantic-conventions

@tigrannajaryan tigrannajaryan transferred this issue from open-telemetry/opentelemetry-specification Jan 18, 2024
@joaopgrassi
Copy link
Member

@open-telemetry/semconv-system-approvers can please take a look at this and see if needs more info or can be added to the system semconv project? thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants