Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable strict context check and fix some context issues. #2637

Merged
merged 14 commits into from
Mar 29, 2021

Conversation

anuraaga
Copy link
Contributor

No description provided.

@@ -168,6 +164,10 @@ public void onExecutionFailure(
}

private void clearAttributes(ExecutionAttributes executionAttributes) {
Scope scope = executionAttributes.getAttribute(SCOPE_ATTRIBUTE);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Both why this change is significant and your reaction to it :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a real context leak with big damage - if sync AWS SDK request failed and a subsequent client request of some form was made, it would parent to the AWS SDK request (well it'd be suppressed as a result usually)! Hooray for strict context checking

Context traceContext = tracer.startSpan(Context.current(), context.getMsgList());
ContextAndScope contextAndScope = new ContextAndScope(traceContext, traceContext.makeCurrent());
context.setMqTraceContext(contextAndScope);
Context otelContext = tracer.startSpan(Context.current(), context.getMsgList());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like similar netty libraries, there's no thread guarantee for before and end. Since there's no downstream instrumentation it's not a big deal anyways

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about e.g. user-provided message hooks that want to augment existing span?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to provide something like RocketMqTracing.getOpenTelemetryContext(ConsumeMessageContext) to let them extract it I guess. In the meantime, if someone had such code using Context.current() in consumeMessageAfter it would generally be buggy with the chance of Context.current() not actually corresponding to the hook's message due to context leakage so this fix is important first.

I hope such hooks will be able to use our instrumentation-api instead of that though.

return false;
}

// Wrapper of tasks for dispatch - the wrapped task should have context already and this doesn't
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what I'm talking about! But seems like a reasonable hypothesis for now, how akka flakes.

@@ -62,3 +62,7 @@ compileVersion101TestGroovy {
classpath = classpath.plus(files(compileVersion101TestScala.destinationDir))
dependsOn compileVersion101TestScala
}

tasks.withType(Test) {
jvmArgs "-Dio.opentelemetry.javaagent.shaded.io.opentelemetry.context.enableStrictContext=false"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice to see where this fails

@@ -168,6 +164,10 @@ public void onExecutionFailure(
}

private void clearAttributes(ExecutionAttributes executionAttributes) {
Scope scope = executionAttributes.getAttribute(SCOPE_ATTRIBUTE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@@ -14,6 +14,43 @@
/** Utils for concurrent instrumentations. */
public class ExecutorInstrumentationUtils {

private static final ClassValue<Boolean> NOT_INSTRUMENTED_RUNNABLE_ENCLOSING_CLASS =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, i'm trying to remember to look out for opportunities to use also 😄

@anuraaga
Copy link
Contributor Author

Sorry for the churn, couldn't help but fix some of the leaks. The remaining will take some deeper dives so I think this is it for now

@@ -168,6 +164,10 @@ public void onExecutionFailure(
}

private void clearAttributes(ExecutionAttributes executionAttributes) {
Scope scope = executionAttributes.getAttribute(SCOPE_ATTRIBUTE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Both why this change is significant and your reaction to it :D

Context traceContext = tracer.startSpan(Context.current(), context.getMsgList());
ContextAndScope contextAndScope = new ContextAndScope(traceContext, traceContext.makeCurrent());
context.setMqTraceContext(contextAndScope);
Context otelContext = tracer.startSpan(Context.current(), context.getMsgList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about e.g. user-provided message hooks that want to augment existing span?

@anuraaga
Copy link
Contributor Author

Unfortunately not able to repro CI failures well but it's nice that it finds them. As far as I can tell there might be something fundamentally wrong with our netty instrumentation, will examine separately first.

@@ -36,6 +37,13 @@ abstract class InstrumentationSpecification extends Specification {
testRunner().clearAllExportedData()
}

def cleanup() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, perhaps we should consider adding an afterTest() method to InstrumentationTestRunnerand calling it here and in InstrumentationExtension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out InstrumentationExtension - adding afterTest seemed a bit weird since it's the same code for agent and library. For now I just copied the code since it isn't so much and I think we can revisit later.

@@ -27,20 +27,18 @@ public void consumeMessageBefore(ConsumeMessageContext context) {
if (context == null || context.getMsgList() == null || context.getMsgList().isEmpty()) {
return;
}
Context traceContext = tracer.startSpan(Context.current(), context.getMsgList());
ContextAndScope contextAndScope = new ContextAndScope(traceContext, traceContext.makeCurrent());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ContextAndScope be removed now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, thanks

protected Boolean computeValue(Class<?> enclosingClass) {
// Avoid context leak on jetty. Runnable submitted from SelectChannelEndPoint is used to
// process a new request which should not have context from them current request.
if (enclosingClass.getName().equals("org.eclipse.jetty.io.nio.SelectChannelEndPoint")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, but maybe those class names should be specified by instrumentation modules? Each instrumented library may bring its own set of excluded classes (also see AdditionalLibraryIgnoresMatcher) and maybe it'd better to define them in the lib instrumentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! That's the approach Datadog has implemented, it would be nice for us to follow it at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - I originally thought about adding some void excludeClasses(ExcludeBuilder) method to InstrumentationModule

@@ -20,3 +20,7 @@ dependencies {
testImplementation group: 'com.sun.activation', name: 'jakarta.activation', version: '1.2.2'
}
}

tasks.withType(Test) {
jvmArgs "-Dio.opentelemetry.javaagent.shaded.io.opentelemetry.context.enableStrictContext=false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that our ratpack instrumentation actually leaks context, or is it an artifact of something else? If we are really leaking contexts, should we create issues to fix all of these, so we make sure to circle back and fix them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, filed an issue

@iNikem iNikem merged commit dcd316d into open-telemetry:main Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants