Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault when destroying Event objects #367

Closed
wants to merge 4 commits into from

Conversation

azeey
Copy link
Contributor

@azeey azeey commented Jun 7, 2022

🦟 Bug fix

Summary

The segfault is caused when attempting to polymorphically destroy an EventT object that was instantiated inside a plugin (or any shared library loaded via dlopen) and the plugin has since been unloaded. Since EventT is a template, it's destructor is instantated in the translation unit that is part of the plugin and when that plugin is unloaded, the destructor is removed from memory as well. Gazebo stores pointers of the base class Event (not EventT) in a container. Since the destructor of Event is virtual, the system will try to call the destructor of EventT when destroying that container, but that function is no longer available.

The solution is to ensure the destructor of EventT is never called. In this PR, this is accomplished by making Event's destructor non-virtual and moving all data members to the base class. I believe it's safe to not call EventTs destructor as long as the destructor is trivial and EventT has no data members, but I would appreciate any feedback if this is not the case.

Note, this will not fix the issue if an application stores EventT objects (instead of Event objects) instantiated by plugins and the plugins get unloaded.

To test, run ign gazebo shapes.sdf and exit using ctrl-c.

Checklist

  • Signed all commits for DCO
  • Added tests
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • Consider updating Python bindings (if the library has them)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

@github-actions github-actions bot added the 🌱 garden Ignition Garden label Jun 7, 2022
@codecov
Copy link

codecov bot commented Jun 7, 2022

Codecov Report

Merging #367 (3a38196) into gz-common5 (6b749fe) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@              Coverage Diff               @@
##           gz-common5     #367      +/-   ##
==============================================
- Coverage       77.88%   77.88%   -0.01%     
==============================================
  Files              84       84              
  Lines           10728    10724       -4     
==============================================
- Hits             8356     8352       -4     
  Misses           2372     2372              
Impacted Files Coverage Δ
events/include/gz/common/Event.hh 100.00% <100.00%> (ø)
events/src/Event.cc 95.45% <100.00%> (+2.86%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Signed-off-by: Addisu Z. Taddese <[email protected]>
@azeey
Copy link
Contributor Author

azeey commented Jun 7, 2022

I've suppressed windows C4251 warnings in 5a6d88d although some of the data members are protected and maybe accessed by derived classes (I think the suppression is meant for when the members are private https:/gazebosim/gz-utils/blob/cdd48b33405fe42f400f8ca1238115c0bcd265e3/include/gz/utils/SuppressWarning.hh#L63-L65). If this is a concern, we can make them private and create accessor member functions.

Copy link
Contributor

@mjcarroll mjcarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach seems sound. I just want to make sure that we aren't counting on current behaviors downstream (at least in our implementations).

events/include/gz/common/Event.hh Show resolved Hide resolved
events/include/gz/common/Event.hh Show resolved Hide resolved
events/include/gz/common/Event.hh Show resolved Hide resolved
events/include/gz/common/Event.hh Show resolved Hide resolved
@azeey azeey marked this pull request as ready for review June 13, 2022 21:15
Copy link
Contributor Author

@azeey azeey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test in 0c03f6b.

events/include/gz/common/Event.hh Show resolved Hide resolved
@chapulina chapulina added bug Something isn't working Breaking change Breaks API, ABI or behavior. Must target unstable version. labels Jul 25, 2022
@chapulina chapulina changed the base branch from main to gz-common5 August 5, 2022 19:06
@azeey
Copy link
Contributor Author

azeey commented Aug 15, 2022

The solution in this PR relies on an undefined behavior (deleting through a base pointer even though it's not virtual) and puts an unusual constraint on users of the EventT class (no data members). So, I explored a different solution using std::shared_ptr and std::weak_ptr where EventT objects created by plugins are stored in the Connection returned by EventT::Connect as shared pointers. In addition, in gz-sim's EventManager, we store weak pointers to the events instead of (unique_ptrs)(https:/gazebosim/gz-sim/blob/bfbe8688ff94bd4e10a49ad94c28b36f8db38ddd/include/gz/sim/EventManager.hh#L72). The hope was that when a plugin gets unloaded, it deletes its shared pointer and if the event is no longer being used, it will be deleted. Subsequently, when EventManager tries to lock the weak pointer, it would fail, at which point, it will create a new EventT object that resides in a different plugin/shared library (this would happen when EventManager::Connect is called from a plugin).

That seems to fix the segfault when exiting gz-sim, but upon further testing in a more narrowly scoped test, I've found that even the destruction of the weak pointer stored in EventManger can attempt to access memory that has been unloaded when a plugin got unloaded. So, unfortunately, I don't think this is a good solution either.

The only solution I'm left with is to provide a new API in gz-plugin that allows loading plugins with RTLD_NODELETE set. This ensures that the shared librares associated with plugins are never actually deleted when dlclose is called. I've proved to myself that this works and fixes the segfault issue, but it warrants further discussion with the team.

@mjcarroll
Copy link
Contributor

Superceded by gazebosim/gz-plugin#102 and friends.

@mjcarroll mjcarroll closed this Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking change Breaks API, ABI or behavior. Must target unstable version. bug Something isn't working 🌱 garden Ignition Garden
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants