Trace distributed work related to consuming SQS messages #4730
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Builds on top of #4739
What Does This Do
This PR enables more accurate distributed tracing of work related to consuming SQS messages.
The resulting traces (and switches) are modelled on existing messaging instrumentations such as Kafka and JMS.
It also avoids generating consuming spans when
SQS.receiveMessage
returns no messages.Additional Notes
Example SQS distributed trace (default configuration)
The Java tracer will track work done while consuming SQS messages and attempt to connect that work back to the trace that produced those messages. The producing and consuming spans, as well as any work done consuming messages, will have a service name of
sqs
.Example SQS distributed trace with time-in-queue span
When legacy SQS tracing is disabled the Java Tracer will add a "time-in-queue" span representing the time the message spent on the queue between being produced and consumed. The "time-in-queue" span will have a service name of "sqs".
The producing and consuming spans, plus any work done related to that message, will now use the application's service name. Only the "time-in-queue" span will have a service name of "sqs".
-Ddd.sqs.legacy.tracing.enabled=false
DD_SQS_LEGACY_TRACING_ENABLED=false
Example trace without any special SQS handling
If you want to restore the behaviour in 0.88.0 and earlier releases where AWS-SDK calls were modelled as simple HTTP client calls, and included the underlying HTTP span, then you can re-enable legacy AWS-SDK tracing with the following option.
-Ddd.aws-sdk.legacy.tracing.enabled=true
DD_AWS_SDK_LEGACY_TRACING_ENABLED=true
Example disconnected SQS trace
Tracking and including work related to SQS messages in the the original trace may lead to some very big traces.
Use the following setting on selected processes to break very long distributed traces into more manageable chunks.
-Ddd.sqs.propagation.enabled=false
DD_SQS_PROPAGATION_ENABLED=false
Example disconnected SQS trace with time-in-queue span
You can also turn off propagation at the AWS-SDK level, which removes generation of
X-Amzn-Trace-Id
headers for all AWS-SDK calls, not just SQS.Note you can still enable "time-in-queue" spans even when the
X-Amzn-Trace-Id
header is not available.-Ddd.sqs.legacy.tracing.enabled=false -Ddd.aws-sdk.propagation.enabled=false
DD_SQS_LEGACY_TRACING_ENABLED=false; DD_AWS_SDK_PROPAGATION_ENABLED=false