Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetrySdkEventSource updates for activity tracking #5387

Closed
wants to merge 5 commits into from

Conversation

pharring
Copy link

@pharring pharring commented Feb 23, 2024

Changes

This change addresses issues with the OpenTelemetrySdkEventSource discovered during testing of profiler support for .NET applications. For example, we use dotnet-trace to profile an OpenTelemetry application. To enable activity tracking, we enable the OpenTelemetrySdkEventSource like this:

dotnet-trace collect -p <pid> --profile cpu-sampling --providers OpenTelemetry-Sdk

In the resulting trace, however, we observe the following problem:

  • ActivityIds are not set on the ActivityStarted/Stopped events from the OpenTelemetry-Sdk provider.

This is because of two problems:

  1. The ActivityStarted/Stopped events are not emitted with the Start/Stop OpCodes.
  2. Even when the OpCodes are added the ActivityTracker doesn't pair up the two events because they are using the wrong suffixes. The ActivityTracker specifically checks for the suffixes "Start" and "Stop". See item 4 here: https://learn.microsoft.com/dotnet/core/diagnostics/eventsource-instrumentation#best-practices and the code here.

To address these two issues: 

  1. Added new Activities keyword.
  2. Added new ActivityStart and ActivityStop events using the Activities keyword and Informational level.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

Copy link

linux-foundation-easycla bot commented Feb 23, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@pharring pharring marked this pull request as ready for review February 24, 2024 00:12
@pharring pharring requested a review from a team February 24, 2024 00:12
Renamed ActivityStarted/Stopped to ActivityStart/Stop
Add tests
Prevent concurrent execution of EventSource tests
Copy link

codecov bot commented Feb 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.22%. Comparing base (6250307) to head (1c55667).
Report is 98 commits behind head on main.

❗ Current head 1c55667 differs from pull request most recent head 89ff9d9. Consider uploading reports for the commit 89ff9d9 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5387      +/-   ##
==========================================
- Coverage   83.38%   83.22%   -0.16%     
==========================================
  Files         297      279      -18     
  Lines       12531    11961     -570     
==========================================
- Hits        10449     9955     -494     
+ Misses       2082     2006      -76     
Flag Coverage Δ
unittests ?
unittests-Solution-Experimental 83.20% <100.00%> (?)
unittests-Solution-Stable 82.99% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...nTelemetry/Internal/OpenTelemetrySdkEventSource.cs 80.00% <100.00%> (-1.91%) ⬇️

... and 47 files with indirect coverage changes

@cijothomas
Copy link
Member

@pharring Can you elaborate on the end goals? The eventsource is purely an internal logging mechanism for the OTel SDK(and other components), and do you plan to rely on that for a particular scenario? If you want to rely on something to get notified on Activity start/stop, then the OTel way would be to author an ActivityProcessor. Alternately, you can achieve that with pure ActivityListener without any OTel dependency as well.

@pharring
Copy link
Author

pharring commented Feb 26, 2024

Hi @cijothomas, the end goal is to be able to light up the "with Activities" views in PerfView when viewing a trace collected from an OpenTelemetry app using an EventPipe profiler like dotnet-trace, or an ETW profiler like PerfView or Visual Studio. Note that these are out-of-proc profilers, so we cannot modify the user's application by adding an ActivityProcessor.

The OpenTelemetrySdkEventSource already emits ActivityStarted/ActivityStopped events at the right place with spanIDs. Unfortunately, because they are mis-named, they don't trigger ActivityTracking. ActivityTracking is what enables the "with Activities" views in PerfView.

It is possible to work around OpenTelemetrySdkEventSource's deficiency by using the DiagnosticSourceEventSource provider instead. It supports ActivitySource tracking. However, the events are verbose (they include all event attributes) if all you need are activity tracking and the spanIds -- this is the case for adding OpenTelemetry support for Application Insights Profiler.

Having had the weekend to think about this, we might be able to improve on this PR without breaking back-compat:
Introduce a new Keyword (there's currently none defined in OpenTelemetrySdkEventSource) specifically for activity tracking and emit additional events (ActivityStart/Stop) under that Keyword (at Informational level). Profilers can then target just that new Keyword, set the level to Informational and they won't see the existing ActivityStarted/Stopped events.

@cijothomas
Copy link
Member

@pharring Thanks for the explanation. I am not that concerned about the back-compat issue, as there was no explicit guarantee about the shape of internal logs.

I am more concerned that you want to take a dependency on this particular naming style, but the SDK does not guarantee to maintain that either. Worst case, there is no guarantee that we won't move away completely from EventSource itself (like in favor of ILogger).

I believe this can be solved elsewhere. For example, dotnet-monitor tool is already able to listen to Metrics, without any OTel component involved. Maybe a similar approach could be used for traces as well, so there is no need of taking a undesired dependency on OTEL SDK internal behavior.

@pharring
Copy link
Author

I am more concerned that you want to take a dependency on this particular naming style, but the SDK does not guarantee to maintain that either.

You might have misunderstood. This isn't something I invented. ActivityTracking via the -Start/-Stop convention and OpCode.Start/Stop is built into the framework and documented. https://learn.microsoft.com/dotnet/core/diagnostics/eventsource-instrumentation#best-practices
Profiler viewers (PerfView, Visual Studio and Application Insights Profiler) use activity tracking (in conjunction with TPL's async causality tracing) to isolate samples (call-stacks) and other events from a single activity (request) and track it end-to-end even across Task (async/await) boundaries.

Worst case, there is no guarantee that we won't move away completely from EventSource itself (like in favor of ILogger).

That would be -- unfortunate. I'm glad you brought it up. There's a unit test in this PR. Is that sufficient to ensure the EventSource's survival? Since the EventSource behavior is not expressible in PublicAPI.Shipped.txt, perhaps I can add a scary comment in the unit test to discourage removal.

I believe this can be solved elsewhere. For example, dotnet-monitor tool is already able to listen to Metrics, without any OTel component involved. Maybe a similar approach could be used for traces as well, so there is no need of taking a undesired dependency on OTEL SDK internal behavior.

It's sort of similar for dotnet-trace. You enable EventSources via EventPipe or ETW with keywords and levels. dotnet-trace works fine for OpenTelemetry apps if you're interested in is app-wide bottlenecks. The problem is if you want to do activity-based (request-based) profiling -- the kind where you look at the timeline of a single request. The viewer needs to collate events and callstacks by activity and, if the app isn't emitting accurate Activity IDs, then everything collapses to a single root activity.

@pharring
Copy link
Author

undesired dependency on OTEL SDK internal behavior.

Wait, are you saying that OpenTelemetrySdkEventSource is not for public consumption? If so, I can respect that. Would you support introducing a new EventSource that is public? Perhaps one that only contains the new ActivityStart/Stop events from this PR? At the same time, perhaps we could rename it to "OpenTelemetry-Sdk-Internal" to make that clear.

@reyang
Copy link
Member

reyang commented Feb 26, 2024

undesired dependency on OTEL SDK internal behavior.

Wait, are you saying that OpenTelemetrySdkEventSource is not for public consumption?

It was only intended for internal troubleshooting, and will be replaced with ILogger #3881 (EventSource was used since ILogger wasn't ready at that time).

If so, I can respect that. Would you support introducing a new EventSource that is public? Perhaps one that only contains the new ActivityStart/Stop events from this PR?

I don't support introducing new EventSource. I support getting rid of all EventSources from OpenTelemetry.

At the same time, perhaps we could rename it to "OpenTelemetry-Sdk-Internal" to make that clear.

💯

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel this is the right direction #5387 (comment).

@pharring pharring closed this Feb 26, 2024
@samsp-msft
Copy link

@pharring why not make this request to the runtime itself? This could be baked into Activity, so you are not dependent on tracing being done with OTel.

@pharring
Copy link
Author

Thank you for the feedback and the suggestion. We'll use DiagnosticSourceEventSource (Microsoft-Diagnostics-DiagnosticSource) for activity tracking instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants