Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create span when SamplingDecision.Drop #3841

Conversation

alanwest
Copy link
Member

@alanwest alanwest commented Oct 28, 2022

Fixes #3290
Fixes #4087

Create spans even on a sampling decision of drop. This conforms the .NET SDK to the specification regarding span creation.

Generate a new span ID for the Span, independently of the sampling decision. This is done so other components (such as logs or exception handling) can rely on a unique span ID, even if the Span is a non-recording instance.

This may have a performance impact as an Activity will always be allocated irrespective of sampling decision. Notable exceptions are the ASP.NET Core and HttpClient instrumentation. The Activity created by these libraries was already always allocated.

@cijothomas
Copy link
Member

Time to really comply with the spec 🤣. I vaguely remember some discussions around this in some SIG calls, and we still decided to use the current approach.... Maybe the heap allocation was a bigger concern?

@alanwest
Copy link
Member Author

Maybe the heap allocation was a bigger concern?

Yes! Was just talking to @CodeBlanch about this. Allocations gonna be painful (note the name of my branch 😆)

But this may be a hit we'll need to take... it's important for both propagating context properly and also correlation of trace data to logs and metrics.

@cijothomas
Copy link
Member

If a root span (t1, s1) is created.. then its child gets not created, and then someone makes a log. the log will be parented to the (t1,s1).

With this change, the child span gets created (t1, s2, parent=s1), but wont see processor or exporter. The log will now be parented to t1,s2.. which might be an issue as log is now pointing to a parent, who is nowhere to be found....

@cijothomas
Copy link
Member

If a root span (t1, s1) is created.. then its child gets not created, and then someone makes a log. the log will be parented to the (t1,s1).

With this change, the child span gets created (t1, s2, parent=s1), but wont see processor or exporter. The log will now be parented to t1,s2.. which might be an issue as log is now pointing to a parent, who is nowhere to be found....

unless we make log point to the span which is sampled or root. (like walk the .Parent chain until we run out of parent)...

@alanwest
Copy link
Member Author

The log will now be parented to t1,s2.. which might be an issue as log is now pointing to a parent, who is nowhere to be found....

My understanding is that this is by design. That is, incomplete traces are to be expected as well as telemetry correlated to things that happened but were not recorded.

@alanwest
Copy link
Member Author

unless we make log point to the span which is sampled or root. (like walk the .Parent chain until we run out of parent)...

If walking the chain is required, then I think this needs to be clarified by the spec.

@CodeBlanch
Copy link
Member

A couple tags...

#3290

#3828 (comment)

@alanwest
Copy link
Member Author

alanwest commented Nov 2, 2022

Closing for now... as discussed in this week's SIG, we do want to make this change, but we want to wait until after the 1.4.0 release. I will pursue fixing up the tests and open the PR back up later.

@alanwest alanwest closed this Nov 2, 2022
@TimothyMothra
Copy link
Contributor

FYI; tested locally and confirmed this change would fix the issue reported in #4087

@cijothomas
Copy link
Member

@TimothyMothra Thanks for investigating it more! This is tagged to resurrected for 1.5. Please chime in with a +1 to the original issue.

@alanwest alanwest reopened this Feb 24, 2023
Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Lets add a good writeup in the PR description about the potential impact this has, and a changelog entry.

Highlighting that for a asp.net core + httpclient scenario, the impact is nil, as these libraries would anyway create activities, so no impact.
Higher allocation, for the case when there are more than span in the trace within proc..

@alanwest alanwest marked this pull request as ready for review February 25, 2023 02:16
@alanwest alanwest requested a review from a team February 25, 2023 02:16
@cijothomas
Copy link
Member

@Oberon00
Could you review please?

@codecov
Copy link

codecov bot commented Feb 25, 2023

Codecov Report

Merging #3841 (61b4dd4) into main (a3ec494) will decrease coverage by 0.09%.
The diff coverage is 75.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3841      +/-   ##
==========================================
- Coverage   85.71%   85.62%   -0.09%     
==========================================
  Files         289      289              
  Lines       11374    11341      -33     
==========================================
- Hits         9749     9711      -38     
- Misses       1625     1630       +5     
Impacted Files Coverage Δ
...tryProtocol/Implementation/MetricItemExtensions.cs 93.33% <0.00%> (-0.63%) ⬇️
...metryProtocol/Implementation/ActivityExtensions.cs 95.90% <100.00%> (-0.31%) ⬇️
src/OpenTelemetry/Trace/TracerProviderSdk.cs 99.26% <100.00%> (-0.02%) ⬇️
...ZPages/Implementation/ZPagesExporterEventSource.cs 56.25% <0.00%> (-6.25%) ⬇️
...emetry.Api/Internal/OpenTelemetryApiEventSource.cs 76.47% <0.00%> (-5.89%) ⬇️
...lementation/SqlClientInstrumentationEventSource.cs 70.83% <0.00%> (-4.17%) ⬇️
...ter.ZPages/Implementation/ZPagesActivityTracker.cs 97.14% <0.00%> (-2.86%) ⬇️
...tp/Implementation/HttpHandlerDiagnosticListener.cs 73.19% <0.00%> (-1.04%) ⬇️
...p/Implementation/HttpInstrumentationEventSource.cs 76.00% <0.00%> (+4.00%) ⬆️

@@ -240,7 +239,7 @@ internal sealed class TracerProviderSdk : TracerProvider
else if (this.sampler is AlwaysOffSampler)
{
listener.Sample = (ref ActivityCreationOptions<ActivityContext> options) =>
!Sdk.SuppressInstrumentation ? PropagateOrIgnoreData(options.Parent) : ActivitySamplingResult.None;
!Sdk.SuppressInstrumentation ? ActivitySamplingResult.PropagationData : ActivitySamplingResult.None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is extremely hard to read. There is some deep nesting of if/else/Listener callback code going on here. I would suggest to extract some (named, top-level) methods to make the code more readable/reviewable.

@Oberon00
Copy link
Member

Maybe only tangentially related but:

I wonder how one would implement transparent API-only propagation in .NET, where you just echo through extracted incoming span contexts: https:/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#behavior-of-the-api-in-the-absence-of-an-installed-sdk

@reyang
Copy link
Member

reyang commented Feb 27, 2023

@alanwest an Activity is fairly expensive from the garbage collection point of view. Could we explore if it's possible to defer the creation only when some code is reading it? Alternatively, maybe explore if a slim version of Activity can be used/reused rather than a full version.

@Oberon00
Copy link
Member

Oberon00 commented Feb 27, 2023

an Activity is fairly expensive from the garbage collection point of view

Maybe System.Diagnostics.Activity is a bad match for the OpenTelemetry Span interface, since I guess the OTel spec was written imagining that a non-recording Span can be implemented rather efficiently.

@alanwest
Copy link
Member Author

Could we explore if it's possible to defer the creation only when some code is reading it? Alternatively, maybe explore if a slim version of Activity can be used/reused rather than a full version.

I think solving this issue within the opentelemetry-dotnet project may prove challenging. Either deferring creation or implementing a slim version may require instrumentation authors to depend on OpenTelemetry.Api rather than just System.Diagnosics to retrieve current span context. I think this is something we're aiming to minimize.

Activity is fairly expensive from the garbage collection point of view

Talking with @CodeBlanch and trying to brainstorm options we might suggest to .NET team to make Activity itself lighter weight. Though, I guess the first question is whether the size of Activity is the primary concern.

In other words, is Activity expensive from a GC point of view because of its size? Or is it expensive irrespective of size and we really just want to avoid heap allocations altogether?

@cijothomas
Copy link
Member

Maybe only tangentially related but:

I wonder how one would implement transparent API-only propagation in .NET, where you just echo through extracted incoming span contexts: https:/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#behavior-of-the-api-in-the-absence-of-an-installed-sdk

This might be difficult/impossible.
One can extract ActivityContext using Propagators, but it does not have a place to put them in (so that it can be further injected to downstream calls)...unless an Activity itself is created with the extracted Context...

It might be possible to do just the baggage part, as once can use Propagators to extract it, store in Baggage.Current, and it can be injected to downstream calls as well.

@cijothomas
Copy link
Member

Could we explore if it's possible to defer the creation only when some code is reading it? Alternatively, maybe explore if a slim version of Activity can be used/reused rather than a full version.

I think solving this issue within the opentelemetry-dotnet project may prove challenging. Either deferring creation or implementing a slim version may require instrumentation authors to depend on OpenTelemetry.Api rather than just System.Diagnosics to retrieve current span context. I think this is something we're aiming to minimize.

Activity is fairly expensive from the garbage collection point of view

Talking with @CodeBlanch and trying to brainstorm options we might suggest to .NET team to make Activity itself lighter weight. Though, I guess the first question is whether the size of Activity is the primary concern.

In other words, is Activity expensive from a GC point of view because of its size? Or is it expensive irrespective of size and we really just want to avoid heap allocations altogether?

https:/open-telemetry/opentelemetry-dotnet/blob/main/test/Benchmarks/Trace/TraceBenchmarks.cs#L32 Its 416 B per activity minimum without any tags. Adding tags will make it ~650B/activity.

@cijothomas
Copy link
Member

may require instrumentation authors to depend on OpenTelemetry.Api rather than just System.Diagnosics to retrieve current span context

Something like ActivityContext.Current (like Baggage.Current) which can be thought of as just the context part, and can be valid/used even if Activity.Current is null?

@alanwest
Copy link
Member Author

alanwest commented Mar 1, 2023

Something like ActivityContext.Current (like Baggage.Current) which can be thought of as just the context part, and can be valid/used even if Activity.Current is null?

The trouble is, I'm not sure how something like this would work. Consider the following:

using (var span1 = source.StartActivity("SampledSpan"))
{
    using (var span2 = source.StartActivity("NotSampledSpan"))
    {
        // span2 is null here, but let's imagine that ActivityContext.Current = span2's context
        
        // logs and exemplars are correlated to span2's context
        log.Info("Some log message.");
        histogram.Record(100);

        // span2's context is propagated
        httpClient.SendAsync(...);
    }

    // How do we that the scope of span2 has ended?
    // We'd need something like ActivityContext.Current.Stop(), but how would this get invoked?
}

@alanwest
Copy link
Member Author

alanwest commented Mar 1, 2023

My sense is that we're stuck with working within the Activity API as is and optimizing perf here will require collaborating with the .NET team. Could ActivitySource introduce an object pool?

@alanwest
Copy link
Member Author

alanwest commented Mar 1, 2023

Its 416 B per activity minimum without any tags.

This could probably be reduced by doing something like:

class Activity
{
   // This would be null if this Activity was not sampled
   private ActivityData Data;
   
   // This would be a valid context
   private ActivityContext context;
}

class ActivityData
{
    // Move all the fields on Activity today to this class
    // basically the fields here: https:/dotnet/runtime/blob/0a15c3b97dabaf94e5d4399aef2fd48c2b3f446a/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs#L90-L116
}

But I'm not sure this helps address the primary concern. Activities would still be allocated and require garbage collection. We'd need to explore if reducing the size of Activity reduces the overhead of garbage collection.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Mar 9, 2023
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Issues and pull requests which have been flagged for closing due to inactivity
Projects
None yet
6 participants