Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-spec / AWS Lambda Detector / Add escape hatch for synchronous resource detection #1273

Closed
NPellet opened this issue Nov 7, 2022 · 15 comments
Labels

Comments

@NPellet
Copy link
Contributor

NPellet commented Nov 7, 2022

Is your feature request related to a problem? Please describe

  • We want to get telemetry running for our AWS Lambdas.
  • If resource detection is asynchronous, then:
    • So should the SDK initialization
    • Can't run the initialization with node -r (otherwise setup might not be done before the lambda has actually run)
    • Has to change application code (I have hundreds and hundreds of lambdas...)
    • Need to create one new handler file for each lambda (very impractical)

Describe the solution you'd like to see

Luckily, the lambda detector is actually synchronous. As it turns out, in my case, it's the only asynchronous resource detection I'd like to run.
So until the opentelemetry-js core developers figure out how they want to handle asynchronous resources, I'd love to get a simple escape-hatch for the lambda resource detector, that would be out-of-spec, but would allow synchronous detection of the resource.

File: https:/open-telemetry/opentelemetry-js-contrib/blob/main/detectors/node/opentelemetry-resource-detector-aws/src/detectors/AwsLambdaDetector.ts

It could look something like:

export class AwsLambdaDetector implements Detector {
  detectSync(_config?: ResourceDetectionConfig): Resource {
    
    // same implementation as now
  }

  async detect( _config?: ResourceDetectionConfig): Promise<Resource> {
    return this.detect(_config);
  }
}

Describe alternatives you've considered

When node 19+ can be used as runtime in AWS Lambda, then we can use the --import cli option and a top-level await.

I'll happily open a PR if you'd consider it.

@Flarna
Copy link
Member

Flarna commented Nov 7, 2022

As alternative you could use @opentelemetry/sdk-trace-node instead @opentelemetry/sdk-node.

It's a lower layer part of SDK for traces only used internally by NodeSDK and requires a bit more lines to write to set it up. For example you have to provide Resource as config. But this has exactly the advantage that you can do your sync resource detection and then create the NodeTracerProvider,...

@Flarna
Copy link
Member

Flarna commented Nov 7, 2022

There was some discussion about sync resource detection: open-telemetry/opentelemetry-js#3055
And also a PR which looked quite promising as far as I remember: open-telemetry/opentelemetry-js#2933

but both were closed by the stale bot...

@NPellet
Copy link
Contributor Author

NPellet commented Nov 7, 2022

Well it's what I do, e.g. for EKS

const resource = Resource.default()
  .merge(
    new Resource({
      [SemanticResourceAttributes.SERVICE_NAME]: options.serviceName,
    })
  )
  .merge(
    await detectResources({
      detectors: [awsEksDetector],
    })
  );

/********* TRACING */
const tracingProvider = new NodeTracerProvider({
  resource,
});

But that's asynchronous, and that's the whole point isn't it ? I want to get rid of the detectResources and only use a synchronous call, like such:

const _awsLambdaDetector = new AwsLambdaDetector()
const resource = Resource.default().merge( _awsLambdaDetector.detectSync() ) })

I've been through all those stales issues. I didn't necessarily want to get into that topic so I thought, at least for AWS Lambdas there's no need to solve the underlying issue, we can just use the "escape hatch" that's out of the spec (as the Detector interface specifies that the detect() method be async)

@Flarna
Copy link
Member

Flarna commented Nov 7, 2022

Ah well sorry. Forgot that's already the detectors API which is async in all cases. I thought that it's the NodeSDK which introduces the async part in some case.

But actually running the AwsLambdaDetector via -r should work. I know it's async but there is no network request or similar which lets the event loop reach idle - it's just a microtask. I'm quite sure it will finish before your handler is called the first time.

@NPellet
Copy link
Contributor Author

NPellet commented Nov 7, 2022

Oh... yeah you're right. I overlooked that. That should solve my issue and is satisfactory enough for my use case (I'll put a big comment in my code).

Although, as a user (before I started digging in the code), that was not obvious.
For all I know (and all I don't know, rather), detectResources might call a process.nextTick or something. As it's hidden in the implementation, I don't think I should be safely assuming the task will terminate immediately

But ok, that's definitely a fair point, and I'll run with it and we can leave it at that.

Thank you :)

@Flarna
Copy link
Member

Flarna commented Nov 7, 2022

The async resource detection is one of the pain points left to be solved. OTel spec requires resource to be immutable so detection needs to be done before the first span is started as resource is a config of the TracerProvider.
Some detectors are really async as they do HTTP request. They "infected" all other detectors by using a common API.

See above linked issue/PR for more details, in depth discussions and started solutions. I'm sure there will be some movement in this at some time but for now be happy that your detector is not "too async" :o).

Actually I think you could implement a true async lambda init via extension as you have a async init phase. But I doubt this is worth the effort in your case.

@NPellet
Copy link
Contributor Author

NPellet commented Nov 7, 2022

Yes well we are actually extensions to load the dependent packages anyways.

But you may be right, I will check if a true async would be possible. So far we've only used external extensions, where this wouldn't apply. But perhaps internal ones... I'll investigate.

@NPellet
Copy link
Contributor Author

NPellet commented Nov 8, 2022

It's me again !
Surprisingly, what we thought didn't work.

Given the following code:

// Asynchronous
const setup = async () => {

 const resource = Resource.default().merge(
      new Resource({
        [SemanticResourceAttributes.SERVICE_NAME]: options.serviceName,
      })
    )
    .merge(
      await detectResources({
        detectors: [awsLambdaDetector],
      })
    );
    // Rest of init...
    opentelemetry.diag.info("Finished loading the OTLP tracer");

}

setup();

I get:

  2022-11-08T20:09:24.870+01:00 2022-11-08T19:09:24.870Z undefined WARN @opentelemetry/instrumentation-aws-lambda Module /var/task/getAssetsEndpoint.js has been loaded before @opentelemetry/instrumentation-aws-lambda so it might not work, please initialize it before requiring /var/task/getAssetsEndpoint.js
  2022-11-08T20:09:24.907+01:00 2022-11-08T19:09:24.907Z undefined INFO Finished loading the OTLP tracer

Given the synchronous version:

// Synchronous
const setup = () => {

 const resource = Resource.default().merge(
      new Resource({
        [SemanticResourceAttributes.SERVICE_NAME]: options.serviceName,
      })
    );
    // Rest of init...
    opentelemetry.diag.info("Finished loading the OTLP tracer");
}

setup();

I get proper initialization (and instrumentation). I'll try to figure out why, but in the meantime, I'm back to square one.

@NPellet
Copy link
Contributor Author

NPellet commented Nov 8, 2022

And on a sidenote, discussing lambda extensions, it sounded good but doesn't work as well as I planned.
Given an internal extension (runtime modification and not a separate process), I thought it would be possible to run:

#!/bin/bash
args=("$@")
OWN_FILENAME="$(basename $0)"
LAMBDA_EXTENSION_NAME="$OWN_FILENAME"
echo "${LAMBDA_EXTENSION_NAME}  launching extension"
node /opt/node_modules/path/to/my/otel/extension/code
echo "Finished setting up extension"
exec "${args[@]}" 

(Obviously) it doesn't work because the process (the first call to the node executable) finishes and all libraries unload. This was stupid of me to try.

Instead, what is needed is:

#!/bin/bash

args=("$@")
OWN_FILENAME="$(basename $0)"
LAMBDA_EXTENSION_NAME="$OWN_FILENAME" # (external) extension name has to match the filename
echo "${LAMBDA_EXTENSION_NAME}  launching extension"
NODE_PATH=NODE_PATH:/opt/node_modules
extra_args=("-r" "/opt/node_modules/path/to/otlp/extension/code")
args=("${args[@]:0:$#-1}" "${extra_args[@]}" "${args[@]: -1}")
echo "Executing ${args[@]}"
exec "${args[@]}" 

Which is just the same thing as setting the -r cli option, and therefore needs the synchronised init.

@NPellet
Copy link
Contributor Author

NPellet commented Nov 8, 2022

Waw that's very interesting. The presence of the additional await for the Promise.all (and by extension, for the detectResources method), makes the rest of the script execute first, despite the inner function doSmth being called and resolving immediately

const doSmth = async() => {
  console.log("doSmth");
}

const main = async () => {
  await Promise.all( [ doSmth() ]);
  console.log( "done");
}

main();
console.log("afterMain");

/*
doSmth
afterMain
done
*/
const doSmth = async() => {
  console.log("doSmth");
}

const main = async () => {
  Promise.all( [ doSmth() ]); // <== Notice here the lack of await
  console.log( "done");
}

main();
console.log("afterMain");

/* Notice here the inverted logs
doSmth
done
afterMain
*/

@Flarna
Copy link
Member

Flarna commented Nov 9, 2022

Sees your ran into open-telemetry/opentelemetry-js#3146. Unfortunately NodeSDK instantiates instrumentations late these days... Workaround is to move this part into the sync startup path instead delegating this task to NodeSDK.

Regarding extensions I was thinking about one implemented in JS. So your JS code registers as internal extension and says done once SDK startup is finished.

If you don't await Promise.all() it means you don't wait until it is finished. your second sample above creates a Promise which is never used. Usually you need the result of doSmth() and this is within the Promise returned by Promise.all which is settled once all promises in the list are settled.

@NPellet
Copy link
Contributor Author

NPellet commented Nov 9, 2022

I absolutely agree. But basically all of this is to justify the associated PR.

In javascript, the internal extension could do something like

setupOTLP().then( () => {
  require("/** process the AWS lambda env variable to determine the script path */");
});

I could certainly do that, but it requires processing the CLI flags in bash (which is doable of course) and reuse them to launch the opentelemetry wrapper.

But I'm not sure how brittle this approach would be.

@Flarna
Copy link
Member

Flarna commented Nov 9, 2022

It's a while ago but I think what we did is to add a layer and a exec wrapper (AWS_LAMBDA_EXEC_WRAPPER) which injected the -r my_script.js to the node command.
The extension itself is within the above script and registers via a POST /extension/register and done via POST /extension/event/next. But this needs to be done for all invocations than also as far as I remember.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2023

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Jan 9, 2023
@github-actions
Copy link
Contributor

This issue was closed because it has been stale for 14 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants