Automatic propagation of peer.service #247

carlosalberto · 2024-01-08T16:59:17Z

Knowing the service name on the other side of a remote call is valuable troubleshooting information. The semantic conventions represent this via peer.service, which needs to be manually populated.

This information can be effectively derived in the backend using the Resource of the parent Span, but is otherwise not available at Collector processing time, where it could be used for sampling and transformation purposes.

Defining (optional) automated population of peer.service will greatly help adoption of this attribute by users and vendors explicitly interested in this scenario.

Based on open-telemetry/semantic-conventions#439

yurishkuro

This OTEP does not discuss the rollout risks. This new field has a very specific semantics that it must be modified or erased before calling a downstream service, neither of which is the default behavior (default is to propagate state unchanged). So if this behavior is introduced into some SDKs, the data in a company for this specific attribute will be a complete mess until every service deploys the updated SDK version.

In general, I am not a fan of using baggage for things that are supposed to change on every hop, it's a very clear abuse of the mechanism.

text/trace/0247-peer-service-propagation.md

yurishkuro · 2024-01-08T17:37:15Z

text/trace/0247-peer-service-propagation.md

+
+Automatic propagation of `peer.service` through `TraceState`.
+
+## Motivation


The motivation does not read convincing to me. Why does a service need to know who called it? If this is part of a transitive root cause isolation workflow, then you just use distributed traces for that. If this is about some business-specific behavior depending on who called you, e.g. multi-tenant behavior. then I think this mechanism is quite inappropriate - relying on a deployment/infra name of the caller is a pretty narrow use case, not suitable for general purpose multi-tenancy. So please describe Users and Jobs To Be Done of this feature.

I could see this being helpful in two scenarios in my work:

Sampling rule where you look at a given event and can see that its caller is X. For example, filtering out a noisy branches in a trace but that caller is something you want to keep.

Folks who want to get this information eventually via tracing, but can't today, and so if there's an easier way to "add otel" without fully adopting tracing and getting this caller info, that'd be helpful for them.

jmacd · 2024-01-18T16:32:19Z

Related work, cc @bogdandrutu @kalyanaj
w3c/trace-context#550

carlosalberto · 2024-01-22T16:49:32Z

Hey @yurishkuro

Added sampling scenarios that may throw light into how useful this feature could be. Please review.

pyohannes · 2024-01-24T08:46:35Z

text/trace/0247-peer-service-propagation.md

+
+### Use scenarios
+
+Sampling can benefit from knowing the calling service, specifically:


As sampling is mentioned as a primary usage scenario for this feature, I wonder if would make sense to bundle this together with other sampling related values for which there's a proposal to propagate them via trace state: #235

All those values could then be consistently used and populated by samplers, one wouldn't need to invent a new configurable mechanism in SDKs (at least for the client side).

This value makes sense to propagate with the sampling-related attributes in the tracestate, which are now covered in open-telemetry/semantic-conventions#793. Still, I see it as an independent property.

jmacd · 2024-03-07T19:49:53Z

@carlosalberto I think we should try to build in more protection against accidental propagation of the peer service information. Also, I'm afraid "upstream" can lead to confusion--although unless we do something to avoid accidental propagation, it's literal and true-- the upstream service name would be the nearest ancestor context that happened to set it.

I want us to consider a mechanism that helps us scope tracestate variables to limit their impact in the future. I'm thinking of an entirely new tracestate vendor code for information (e.g, to for "Transient OpenTelemetry") that would purposefully terminate after a propagation event. (I think of this approach as complimentary to the idea in #207, which is for scoping state until the following propagation event.)

That said, this seems like an opportunity to allow peers to exchange more than only their service name. If we had a tracestate field for exchange of arbitrary peer-related variables, then the new SDK configuration knobs would be:

which attributes to Inject
which variables to apply as "context-scoped" (in the sense of Propose Context-scoped attributes. #207) in such a way they apply to the span (e.g., or metrics or logs) following Extract

Then, a tracestate could be formed to exchange arbitrary attributes, like:

tracestate: to=peer.service:myservice;some.property:somevalue;etc:etc

Receivers would apply these variables to the context and drop them from the tracestate before creating a new context.

carlosalberto · 2024-09-20T13:12:20Z

Closing this as there doesn't seem to be enough interest and I don't plan to work on it myself

(It can always be "resurrected" by copying the OTEP file and opening it as a new instance, etc)

Automatically propagate peer.service

43079cd

carlosalberto requested review from a team January 8, 2024 16:59

yurishkuro reviewed Jan 8, 2024

View reviewed changes

text/trace/0247-peer-service-propagation.md Outdated Show resolved Hide resolved

yurishkuro reviewed Jan 8, 2024

View reviewed changes

carlosalberto added 2 commits January 19, 2024 16:19

Use 'us' instead of 'sn' in the tracestate subkey.

10ad05d

Add sampling examples.

c8b500a

carlosalberto force-pushed the peer-service-propagation branch from 3bf006e to c8b500a Compare January 22, 2024 16:47

carlosalberto added 2 commits January 22, 2024 17:51

Details.

5b8cc6a

Clarify.

21ca221

pyohannes reviewed Jan 24, 2024

View reviewed changes

tedsuo added the triaged label Jan 29, 2024

Merge branch 'main' into peer-service-propagation

7915481

jmacd requested a review from kentquirk March 7, 2024 16:42

This was referenced Jul 31, 2024

Trace Context Extensibility w3c/trace-context#573

Open

Add peer.service attribute to all service spans open-telemetry/opentelemetry-demo#1635

Open

carlosalberto closed this Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic propagation of peer.service #247

Automatic propagation of peer.service #247

carlosalberto commented Jan 8, 2024 •

edited

Loading

yurishkuro left a comment

yurishkuro Jan 8, 2024

cartermp Jan 29, 2024

jmacd commented Jan 18, 2024

carlosalberto commented Jan 22, 2024

pyohannes Jan 24, 2024

jmacd Mar 7, 2024

jmacd commented Mar 7, 2024 •

edited

Loading

carlosalberto commented Sep 20, 2024


		Automatic propagation of `peer.service` through `TraceState`.

		## Motivation


		### Use scenarios

		Sampling can benefit from knowing the calling service, specifically:

Automatic propagation of peer.service #247

Automatic propagation of peer.service #247

Conversation

carlosalberto commented Jan 8, 2024 • edited Loading

yurishkuro left a comment

Choose a reason for hiding this comment

yurishkuro Jan 8, 2024

Choose a reason for hiding this comment

cartermp Jan 29, 2024

Choose a reason for hiding this comment

jmacd commented Jan 18, 2024

carlosalberto commented Jan 22, 2024

pyohannes Jan 24, 2024

Choose a reason for hiding this comment

jmacd Mar 7, 2024

Choose a reason for hiding this comment

jmacd commented Mar 7, 2024 • edited Loading

carlosalberto commented Sep 20, 2024

carlosalberto commented Jan 8, 2024 •

edited

Loading

jmacd commented Mar 7, 2024 •

edited

Loading