Skip to content
This repository has been archived by the owner on Nov 10, 2022. It is now read-only.

Return suppressed instrumenation context from noop context manager #27

Closed
dyladan opened this issue Mar 31, 2021 · 10 comments
Closed

Return suppressed instrumenation context from noop context manager #27

dyladan opened this issue Mar 31, 2021 · 10 comments

Comments

@dyladan
Copy link
Member

dyladan commented Mar 31, 2021

Is this something we can/should improve in API? The main idea was to return Noop instances if version doesn't match but it should not crash nor end up in endless loops.

I believe that it does return NOOP. Our endless loop happened due to the context not being set and read by the same api version, causing not to work - which is a recipe for infinite loops.
There might be other issues such as propagation not working etc, but it will probably resolve when we move to api v.1.0.0 everywhere.

Originally posted by @blumamir in open-telemetry/opentelemetry-js#2055 (comment)

We can prevent this endless loop by returning a suppressed instrumentation context from the noop context manager. If the user is manually tracing without a context manager, then calling context.active() should not be used anyway.

One important caveat to the above is that a call to startSpan will by default grab the active context by default. If this context has instrumentation suppressed then the returned span will be noop. Possibly we could introduce a new constant context similar to ROOT_CONTEXT but which is actually NOOP_CONTEXT (only returned by the noop context manager) which can be used to determine if the context manager is enabled?

@dyladan dyladan transferred this issue from open-telemetry/opentelemetry-js Mar 31, 2021
@dyladan
Copy link
Member Author

dyladan commented Mar 31, 2021

One open question: is this considered a breaking change? @open-telemetry/javascript-maintainers

On further thought, it is for sure breaking if we return a suppressed instr context. If the user is manually tracing, startSpan would now return a noop span if the user does not explicitly pass a ROOT_CONTEXT.

This break may be mitigated by returning a CONTEXT_MANAGER_DISABLED context instead of the INSTRUMENTATION_SUPPRESSED context.

@Flarna
Copy link
Member

Flarna commented Mar 31, 2021

Does this really solve a problem? If there is all noop it's don't care if we return ROOT_CONTEXT or NOOP_CONTEXT as the Tracer in question is a NoopTracer.
If we have a mixmax of API/SDK versions with ContextManager, TracerProvider,.. "partly" present and mixed with Noops it seems more luck that the path choosen is that one solved here.

I think there are by far too much places where context.active() is used as default to ensure that this is has no side effects. If we go this path I would prefer to remove all this automatism/defaulting.

Just think about the following (ref here): api.propagation.extract(context.active(), carrier) - this would result in a suppressing context even if a valid span is extracted.
I know this can be fixed by using ROOT_CONTEXT instead but how much code is in the wild using context.active()?

So in short if we do this I would consider it as breaking.

Please consider also the specified Behavior of the API in the absence of an installed SDK.
I know this part of the spec is anyway hard to fullfill in node without context manager.

@dyladan
Copy link
Member Author

dyladan commented Mar 31, 2021

Another reasonable option might be to instead of Noop context manager use an extremely simple sync context manager.

@vmarchaud
Copy link
Member

Another reasonable option might be to instead of Noop context manager use an extremely simple sync context manager.

Well on web i don't see an issue (since there is already one) but we know that will not work on node in most case (specially since a lot of people are adopting async/await syntax). Plus performance cost, which is important when "nothing" should be happening.

Please consider also the specified Behavior of the API in the absence of an installed SDK.
I know this part of the spec is anyway hard to fullfill in node without context manager.

Does it make sense to "try" to fullfill anyway since we know its almost impossible ?

@dyladan
Copy link
Member Author

dyladan commented Mar 31, 2021

@Flarna there is no chance for the user to pass a context to be used for an export. We could create a ManualSpanProcessor for this usecase to be used by manual tracers, but then we are opening a whole new can of worms I would prefer not to get into with plenty of fun new ways for users to shoot themselves in the foot.

If there is no SDK installed, the user should still be able to do propagation by manually passing context. This is still possible with suppressed instrumentation, noop, or disabled contexts because the noop span propagates ids. In this usecase, we don't need to worry about the export infinite loop anyway because there is no exporter.

With an SDK that fails to install because of a mismatched API version with the end user's API version the user may call methods on the tracer from that SDK and enter this infinite loop. There are several places we could break this loop:

  1. If the SDK fails to install, do not install autoinstrumentation
  2. Detect in autoinstrumentation if sdk is installed/enabled or not. If not, do not create spans. We cannot enforce this behavior perfectly in third party instrumentations, but we can do our best to make this the default behavior of our instrumentation base class (perhaps by returning a noop tracer on the tracer getter if the sdk is not installed?)
  3. return a boolean from register calls which indicates success/failure to install. This puts the burden on the user to check this boolean. Perhaps still entering an infinite loop, but with at least one additional tool to help debug?
  4. Exporter detects when it is trying to export a span it created itself. This puts the burden entirely on exporters. Similar to third party instrumentations, we have no control over third party exporters to enforce this behavior.

There may be other fixes, but these are the ones I can think of now without returning suppressed instrumentation context from the noop context manager. Of these, I think option (1) or (2) is my favorite. I still have not decided between them. (2) is likely easier to implement.

@Flarna
Copy link
Member

Flarna commented Mar 31, 2021

@Flarna there is no chance for the user to pass a context to be used for an export. We could create a ManualSpanProcessor for this usecase to be used by manual tracers,

Sorry but I don't understand this statement at all. Is this related to my comment above? I was talking about extract not export.

What is "the SDK" in your comment?
If it is a TracerProvider and installation fails the API default should be used which is a NoOp.
Clearly this TracerProvider may be passed around (e.g. passed into instrumentations via registerInstrumentation(),...) but once some instrumentation is using it together with the global ContextManager it fails.

I agree that we might break the exporter loop by using suppressInstrumentation in NoopContextManager. But we will for sure not fix all issues which people have in such mixed environments.

We could disable a TracerProvider if it is tried to be installed global but this fails. But I doubt that this is enough.

I think the main issue is that we allow more TracerProviders, ContextManagers. If there is only one global instance and this one is used by all instrumentations,... it's either always a Noop or never.

In current setup it's by far to easy to use a Tracer created by a local TracerProvider which in turn uses the global ContextManager. Even if there is no endless loop in exporting the overal behavior is some sort of undefined.

@Flarna
Copy link
Member

Flarna commented Mar 31, 2021

Does it make sense to "try" to fullfill anyway since we know its almost impossible ?

Well, I think it depends on the actual use case. Supporting this alone by API will for sure not work because the Propagators are already Noops.

In an Http Proxy or similar it should work to install HttpTraceContext propagator, AsyncHooks context manager and Http Instrumenation. No need for a complete export chain.
I could imagine that there are even simpler load balancers where only extract and inject is needed and direct passing of context works fine (so no need for ContextManager and Instrumentation).

To my understanding the use case here is to keep traces alive across a process which can't export spans (e.g. because it is used by multiple tenants therefore a single exporter would be wrong).

Anyhow, this usecase should not be the main focus here. I mainly noted it because I thought it's related/possibly effected. But as more I think about it I tend to say I was wrong regarding this.

@dyladan
Copy link
Member Author

dyladan commented Apr 7, 2021

@Flarna there is no chance for the user to pass a context to be used for an export. We could create a ManualSpanProcessor for this usecase to be used by manual tracers,

Sorry but I don't understand this statement at all. Is this related to my comment above? I was talking about extract not export.

Ah oops sorry

What is "the SDK" in your comment?
If it is a TracerProvider and installation fails the API default should be used which is a NoOp.
Clearly this TracerProvider may be passed around (e.g. passed into instrumentations via registerInstrumentation(),...) but once some instrumentation is using it together with the global ContextManager it fails.

My comment referred to the context manager. There was actually a context manager properly installed, but an instrumentation used an incompatible version of the API so it received a Noop context when it tried to call context.active(). Since this context does not suppress instrumentation, it traced the request which happened to be a zipkin export. This span was also exported, traced, and we end up in an infinite loop.

I agree that we might break the exporter loop by using suppressInstrumentation in NoopContextManager. But we will for sure not fix all issues which people have in such mixed environments.

I don't know if we can fix all issues in mixed environments, but we can do our best to mitigate damage.

We could disable a TracerProvider if it is tried to be installed global but this fails. But I doubt that this is enough.

In this case, the global install didn't fail. It was the instrumentation which received a context from an incompatible API. What we need is for some way for the instrumentation to know the context it received is not a good context.

I think the main issue is that we allow more TracerProviders, ContextManagers. If there is only one global instance and this one is used by all instrumentations,... it's either always a Noop or never.

In current setup it's by far to easy to use a Tracer created by a local TracerProvider which in turn uses the global ContextManager. Even if there is no endless loop in exporting the overal behavior is some sort of undefined.

I would also like to only have one global, but there is no way to prevent users from keeping their reference after registering the global. Also, this wouldn't have fixed this particular loop.

@Flarna
Copy link
Member

Flarna commented Apr 7, 2021

I think we should at least update our samples to never pass a TracerProvider around without really needed. e.g. registerInstrumentation() perfectly picks up the global TracerProvider so there is no need to pass it directly and having the risk that instrumentations use a different TracerProvider then the remaining system.

We should still keep the option to do it but document the risks.

@dyladan
Copy link
Member Author

dyladan commented Oct 10, 2022

This infinite loop is fixed in recent API versions

@dyladan dyladan closed this as completed Oct 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants