Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXPORTER] Fix OTLP HTTP exporting in sync mode. #2193

Merged
merged 3 commits into from
Jun 13, 2023

Conversation

owent
Copy link
Member

@owent owent commented Jun 12, 2023

Fixes #2191

Changes

Please provide a brief description of the changes here.

  • Use shared_ptr to hold the lifetime when using OTLP HTTP exporting in sync mode.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

@owent owent requested a review from a team June 12, 2023 08:09
@codecov
Copy link

codecov bot commented Jun 12, 2023

Codecov Report

Merging #2193 (c1b03d2) into main (4b20067) will not change coverage.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2193   +/-   ##
=======================================
  Coverage   87.52%   87.52%           
=======================================
  Files         169      169           
  Lines        4887     4887           
=======================================
  Hits         4277     4277           
  Misses        610      610           

Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

From the bug report,

the lambda capture [&session_result] causes a capture of a reference to a local variable, causing the bug.

From the fix,

the lambda capture [session_result] now captures a shared pointer, presumably incrementing the reference counter upon capture.

From existing code,

the callback function in OtlpHttpClient::Export can be given to a session, with

auto session = createSession(message, std::move(result_callback));

and then the session, which owns the callback, is stored in running_sessions_ by:

addSession(std::move(opentelemetry::nostd::get<HttpSessionData>(session)));

So, when some code paths return from ::Export() on timeout, the callback is kept elsewhere already, and local variables (in the old code) are destroyed.

Please:

  1. Confirm my understanding is correct

  2. Detail where / when is the destructor called for the lambda capture [session_result], that would decrement the last reference to the session_result, and destroy the object created by make_shared in the fix ... I don't understand this part.

@owent
Copy link
Member Author

owent commented Jun 13, 2023

Thanks for the fix.

From the bug report,

the lambda capture [&session_result] causes a capture of a reference to a local variable, causing the bug.

From the fix,

the lambda capture [session_result] now captures a shared pointer, presumably incrementing the reference counter upon capture.

From existing code,

the callback function in OtlpHttpClient::Export can be given to a session, with

auto session = createSession(message, std::move(result_callback));

and then the session, which owns the callback, is stored in running_sessions_ by:

addSession(std::move(opentelemetry::nostd::get<HttpSessionData>(session)));

So, when some code paths return from ::Export() on timeout, the callback is kept elsewhere already, and local variables (in the old code) are destroyed.

Please:

1. Confirm my understanding is correct

2. Detail where / when is the destructor called for the lambda capture `[session_result]`, that would decrement the last reference to the session_result, and destroy the object created by make_shared in the fix ... I don't understand this part.
  1. Yes, that's right.
  2. The callback is stored in HttpSessionData which is holded by OtlpHttpClient::running_sessions_, OtlpHttpClient::gc_sessions_ .We remove session data from running_sessions_ only when we get a event or response from http client.And when wait for response in sync mode, session_waker_.wait_for may be timeout before we get any event from it.

Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix and clarifications.

Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the fix. This part of code has lots of edge scenarios, would be good to extend the unit-tests to cover them sometime later.

@marcalff
Copy link
Member

/easycla

@marcalff marcalff merged commit fc8853e into open-telemetry:main Jun 13, 2023
@ThomsonTan
Copy link
Contributor

Seems this current fix requires a heap allocation, even this is sync Export. And the heap allocation is only for the edge case (timeout). Wondering is it possible to avoid the heap allocation on ExportResult here?

@owent
Copy link
Member Author

owent commented Jun 15, 2023

Seems this current fix requires a heap allocation, even this is sync Export. And the heap allocation is only for the edge case (timeout). Wondering is it possible to avoid the heap allocation on ExportResult here?

By now, I have no idea without break the behaviour of return value.

@owent owent deleted the fix_2191 branch July 4, 2023 10:47
@marcalff marcalff changed the title Fix OTLP HTTP exporting in sync mode. [EXPORTER] Fix OTLP HTTP exporting in sync mode. Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OTLP http exporter may cause crash in sync mode
4 participants