Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument for AWS X-Ray with OpenTelemetry #1958

Open
patchwork01 opened this issue Mar 4, 2024 · 1 comment · May be fixed by #1962
Open

Instrument for AWS X-Ray with OpenTelemetry #1958

patchwork01 opened this issue Mar 4, 2024 · 1 comment · May be fixed by #1962
Labels
enhancement New feature or request on-hold

Comments

@patchwork01
Copy link
Collaborator

patchwork01 commented Mar 4, 2024

Background

Split from:

Description

We'd like to instrument our lambas and Fargate tasks for AWS X-Ray, so that we can trace execution in a few cases:

  • A CloudWatch scheduled rule triggers a lambda, which passes messages through an SQS queue to another lambda which does something
  • A CloudWatch scheduled rule triggers a lambda, which passes a message through an SQS queue to a Fargate task which runs a job
  • A client sends a message on an SQS queue which is picked up by a Fargate task
  • A client sends a message on an SQS queue which is picked up by a lambda

Analysis

OpenTelemetry instrumentation

We tried instrumenting with the ADOT Lambda layer:

https://aws-otel.github.io/docs/getting-started/lambda

This didn't fit into the code size limit for our lambdas.

We have a couple of alternatives to make it fit:

Alternative libraries

We could use the AWS X-Ray SDK or OpenTelemetry:

https://docs.aws.amazon.com/xray/latest/devguide/xray-instrumenting-your-app.html

Given that all of Sleeper is deployed in AWS, and all the entrypoints into Sleeper are packaged specifically for AWS, if all we need is default auto-instrumentation, it seems reasonable to use AWS' own SDK.

We can see how much information we get from adding the AWS X-Ray SDK in all deployed artifacts, and enabling tracing on lambdas attached to CloudWatch rules.

See the AWS documentation for AWS X-Ray instrumentation:

https://docs.aws.amazon.com/lambda/latest/dg/services-xray.html
https://docs.aws.amazon.com/lambda/latest/dg/java-tracing.html#java-xray-sdk
https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-java.html

We split out a separate issue for the option to instrument with the AWS X-Ray SDK:

Modules to instrument

We'll need to make sure that every module we instrument is not depended on by other modules, otherwise we'd add X-Ray instrumentation there unintentionally. It's probably not a problem if X-Ray gets added to the system test drivers module, which depends on several modules we'll need to instrument.

The bulk import runner will run inside EMR, which seems like it might not work well with X-Ray. We can handle that separately if we want it later.

Ingest tasks are built from the ingest-runner module, but there are other modules that also depend on it, including the Trino plugin and the bulk import runner. We'll want to avoid adding X-Ray to those, so we'll need to split a separate module out of ingest-runner for the code that will run in ECS for the ingest task. We can make that a separate issue.

We could instrument the custom CDK resources but that seems unnecessary.

Modules to instrument:

  • athena (no dependencies)
  • bulk-import-starter (depended on by system-test-drivers)
  • compaction-job-creation-lambda (no dependencies)
  • compaction-job-execution (no dependencies)
  • compaction-task-creation (no dependencies)
  • garbage-collector (no dependencies)
  • ingest-batcher-job-creator (no dependencies)
  • ingest-batcher-submitter (depended on by system-test-data-generation)
  • ingest-starter (no dependencies)
  • metrics (no dependencies)
  • query-lambda (no dependencies)
  • splitter-lambda (no dependencies)

Instrumentation libraries detail

We started by enabling auto-instrumentation with the AWS X-Ray SDK. This gives useful information, but in order to get more granular, eg. state store methods, we would need to add this as a dependency to other modules. The AWS X-Ray SDK requires at minimum a dependency on aws-xray-recorder-sdk-core, in order to report on individual method calls. This includes a dependency on aws-java-sdk-xray and aws-java-sdk-core, which seems a little excessive.

The AWS X-Ray SDK for Java also requires use of the X-Ray daemon, which we deployed as a sidecar to our Fargate task. This is a bit fiddly, as the memory and CPU requirements can be configured independently, and must be for EC2.

The AWS X-Ray SDK dependencies also aren't easy to disable once you add them as dependencies. The library for setting up auto-instrumentation is a Maven dependency, and it's relatively heavyweight.

The AWS Distro for OpenTelemetry lets you report to AWS X-Ray using the OpenTelemetry libraries instead, which include a much more minimal API. It doesn't require the X-Ray daemon, and the auto-instrumentation is in agent code which doesn't need to be added as a dependency. We can try using that instead:

https://docs.aws.amazon.com/xray/latest/devguide/xray-instrumenting-your-app.html#xray-instrumenting-opentel
https://docs.aws.amazon.com/lambda/latest/dg/java-tracing.html#java-adot

@patchwork01 patchwork01 added the enhancement New feature or request label Mar 4, 2024
@patchwork01 patchwork01 added this to the 0.22.0 milestone Mar 4, 2024
@patchwork01 patchwork01 self-assigned this Mar 4, 2024
@gaffer01 gaffer01 modified the milestones: 0.22.0, 0.23.0 Mar 12, 2024
@patchwork01 patchwork01 removed their assignment Mar 12, 2024
@patchwork01 patchwork01 changed the title Instrument lambdas and Fargate tasks for AWS X-Ray Instrument for AWS X-Ray with OpenTelemetry Mar 13, 2024
@patchwork01
Copy link
Collaborator Author

On hold because we might want to leave this until later if we don't need detailed tracing to test the transaction log state store.

@gaffer01 gaffer01 modified the milestones: 0.23.0, 0.24.0 Apr 16, 2024
@gaffer01 gaffer01 removed this from the 0.24.0 milestone May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request on-hold
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants