Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add shoryuken instrumentation #536

Closed

Conversation

mcinbell
Copy link
Contributor

@mcinbell mcinbell commented Jun 30, 2023

This PR adds basic server instrumentation for Shoryuken.

See screenshot below of example trace viewed in Datadog:

Screenshot 2023-08-11 at 5 32 05 pm

@arielvalentin
Copy link
Collaborator

@mcinbell looks like there are some bundler issues https:/open-telemetry/opentelemetry-ruby-contrib/actions/runs/5487846178/jobs/9999873100?pr=536

@ericmustin ericmustin self-assigned this Jul 11, 2023
@ericmustin
Copy link
Contributor

will review this week

@arielvalentin
Copy link
Collaborator

Ok so then context propagation must not be working for the current implementation. Looks like it may need custom context propagation getter snd setter

@simi
Copy link
Contributor

simi commented Oct 26, 2023

Message doesn't respond to [], see its implementation - https:/ruby-shoryuken/shoryuken/blob/a2e2c89844a50dcef33d1949e2ad6b32a4962c2f/lib/shoryuken/message.rb. You're right, it is not usable for context propagation.

@mcinbell
Copy link
Contributor Author

Surprised I haven't noticed the error in our logs. May have been lost in the noise of other log messages in development and possibly not seen in production because it's logged as a warning.

Guess this hasn't been a problem for our use of it in production as it doesn't raise an exception and since I hadn't implemented the client side of things there was no context propagation expectation. As I've mentioned, I naively based this implementation heavily on the sidekiq implementation (see here) and since the implementation was working for us as is I figured it was good to share. I assume there's no real value in the context propagation extraction code without the client side injecting the context? Is it worth dropping this context propagation extraction code, with an appropriate note in the readme, so that the basic server side instrumentation implementation can be merged, with the idea that context propagation will come with the client side implementation?

@arielvalentin
Copy link
Collaborator

Shoryuken workers are able to process messages from any SQS client.

One of the first bug reports we'll likely receive is that the worker spans are disconnected from other traces so I hesitate to accept this PR or release this gem without context propagation working.


before do
allow(sqs_client).to receive(:get_queue_url).and_return(double(queue_url: 'https://sqs.fake.amazonaws.com/1/queue-name'))
allow(sqs_msg).to receive(:[]).with('baggage')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 removing this and following line will reveal the error in tests as well

@simi
Copy link
Contributor

simi commented Oct 28, 2023

@mcinbell I think you can follow the sidekiq in here and add client middleware using injector to inject traceparent into message body, and later extract from message.

  1. injection happens here

  2. extraction happens here

The only remaining question is where to store the traceparent. Looking at Shoryuken::Message message_attributes could be the best way to hijack for this purpose aka extracted_context = OpenTelemetry.propagation.extract(sqs_msg.message_attributes) in server middleware and OpenTelemetry.propagation.inject(sqs_msg.message_attributes) in client middleware.

@mcinbell
Copy link
Contributor Author

mcinbell commented Nov 3, 2023

@simi, yup, was aware of where the client side injected the context. My point was that since I hadn't implemented the client side context injection, the context extraction code seemed redundant until that context injection was happening in a client side implementation. Happy to update the implementation to attempt a context extraction using the message_attributes. As I think I mentioned in an earlier comment, not sure when I'll be able to find the time to implement the client side which would do the injection. It's in our backlog, just not a pressing priority at the moment.

@mcinbell
Copy link
Contributor Author

mcinbell commented Nov 3, 2023

@arielvalentin, with regards to context propagation not working from other clients, I am not very familiar with how the context propagation works but had assumed that it would be relatively dependent on context injected from the associated client code and had assumed shoryuken is generally used for both client and worker side, since it's largely modelled on Sidekiq. I guess if this is a concern and it's not sufficient to include a known issue regarding context propagation in the readme then this PR will have to wait till I find the time to implement and test the client side. I will make the change suggested by @simi to use the message_attributes and will attempt to find some time over the next few weeks to implement the client side.

@simi
Copy link
Contributor

simi commented Nov 4, 2023

@mcinbell I can take a a look and provide PR to your branch.

Copy link
Collaborator

@arielvalentin arielvalentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simi and I were discussing this PR at length in the CNCF slack and I think we have identified a few use cases that we think will change how this instrumentation should work.

The AWS SDK instrumentation is missing Consumer Content Propagation. This means that all AWS SQS Consumers are disconnected from the incoming traces:

https:/open-telemetry/opentelemetry-ruby-contrib/blob/main/instrumentation/aws_sdk/lib/opentelemetry/instrumentation/aws_sdk/handler.rb#L34

Assuming the user turns on the AWS SDK instrumentation in conjunction with this Shoryuken instrumentation, you will see multiple consumer spans that represent the same message but likely disconnected from each other.

That is not ideal.

We believe that this instrumentation should require the AWS SDK Instrumentation and either enrich the Consumer span generated there or create an internal span that contains many, if not all, of the same attributes.

We have a precedent set in Sinatra and Rails/ActionPack instrumentations, that depend on the Rack instrumentation to be installed in order for them to also be installed. That would remove the requirement for Shoryuken from having to perform context extraction when creating a new span.

Since I am not an AWS or Shoryuken user, I do not have a strong preference for whether or not to enrich the AWS span or create an internal span, so I will leave that to you and @simi to decide.

@mcinbell
Copy link
Contributor Author

@simi and I were discussing this PR at length in the CNCF slack and I think we have identified a few use cases that we think will change how this instrumentation should work.

The AWS SDK instrumentation is missing Consumer Content Propagation. This means that all AWS SQS Consumers are disconnected from the incoming traces:

https:/open-telemetry/opentelemetry-ruby-contrib/blob/main/instrumentation/aws_sdk/lib/opentelemetry/instrumentation/aws_sdk/handler.rb#L34

Assuming the user turns on the AWS SDK instrumentation in conjunction with this Shoryuken instrumentation, you will see multiple consumer spans that represent the same message but likely disconnected from each other.

That is not ideal.

We believe that this instrumentation should require the AWS SDK Instrumentation and either enrich the Consumer span generated there or create an internal span that contains many, if not all, of the same attributes.

We have a precedent set in Sinatra and Rails/ActionPack instrumentations, that depend on the Rack instrumentation to be installed in order for them to also be installed. That would remove the requirement for Shoryuken from having to perform context extraction when creating a new span.

Since I am not an AWS or Shoryuken user, I do not have a strong preference for whether or not to enrich the AWS span or create an internal span, so I will leave that to you and @simi to decide.

@arielvalentin @simi Just catching up on the discussion... Yes, makes complete sense for the instrumentation to require the AWS SDK and completing the consumer side context extraction within the AWS SDK also sounds good. With regards to the question of enriching the AWS span or creating an internal span, am not sure I fully appreciate the implications of either, though creating the internal span would at least seem to keep the context of any additional attributes distinct.

Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label Dec 11, 2023
@github-actions github-actions bot removed the stale Marks an issue/PR stale label Dec 12, 2023
@mcinbell mcinbell requested a review from a team January 5, 2024 05:44
Copy link
Contributor

github-actions bot commented Feb 5, 2024

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label Feb 5, 2024
@github-actions github-actions bot closed this Feb 21, 2024
@arinhouck
Copy link

arinhouck commented Oct 16, 2024

Is this no longer being considered? @mcinbell @kaylareopelle @arielvalentin

I'm curious what is left to get this to move forward.

@kaylareopelle
Copy link
Contributor

Hi @arinhouck! Thanks for bringing this up.

I think the main question blocking this PR is in this comment. Since Shoruyken has a relationship with AWS, I think we may need to verify that everything works well with them together.

There have also been some updates to the AWS SDK instrumentation lately (#1186, #1150), and I'm not sure how they would interact with this instrumentation.

I'm not sure what we do for the Rack-adjacent instrumentation, but maybe some sort of test validating the interaction would help?

@mcinbell @simi @arielvalentin - I think you all were a lot closer to this than I was. Maybe you can chime in?

@arinhouck
Copy link

arinhouck commented Oct 16, 2024

I grasp things a bit better now thanks @kaylareopelle for the overview and quick response, I was just introduced to open telemetry very recently so I won't be of great use in this contribution quite yet. I found this snippet as an great first option that I will try out if others are looking for a simple first step to instrumenting Shoryuken until an official implementation is in place. I will keep an eye on this thread and familiarize myself with the current contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed stale Marks an issue/PR stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants