-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix EntityComponentManager race condition #601
Fix EntityComponentManager race condition #601
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than my comment about API/ABI below, you mentioned that this change causes a decrease in performance. Do you have any numbers that measure the change in performance before/after this change?
My apologies, I meant scene loading performance. I don't have any numbers; just that before this fix I would have the scene show up as a whole, but with this fix the models show up in stages; so it might not be such a bad thing after all. cc @luca-della-vedova have you seen this stage by stage loading behaviour with the clinic demo? |
I believe this is the underlying problem. We shouldn't be calling
If one of these suggestions work, I think it could be more reliable than the current proposal, which requires each plugin to protect its own updates. We should make the updates thread-safe by default so plugins don't need to worry about it. |
Thanks for the suggestions. I've attempted the 2 suggestions:
This ends up in a deadlock. My guess is some plugin is likely waiting for some results in another plugin.
This works great, I've put it into the PR but it ends up with messages that the ign transport node is not ready, see below.
I havent found any function that allows me to wait on the transport layer node (or check if it's ready) in |
Signed-off-by: ddengster <[email protected]>
Signed-off-by: Louise Poubel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it ends up with messages that the ign transport node is not ready,
I think that message is misleading, what's happening is that we're trying to advertise the same service several times. I took the liberty of pushing a fix in 8a3734f, let me know what you think. This is ready to merge if that works for you. Thanks!
You're welcome. The advertising error messages are gone now; let's merge it if there are no more outstanding problems. |
This PR attempts to fix an incredibly hard to reproduce race condition that results in a
std::map::at()
crash in theView::ComponentImplementation()
function.printouts show 2 threads trying to update the view.
Gists for the stack trace of both threads:
https://gist.github.com/luca-della-vedova/44fc24aa3794c70ff814feccb05d4831
https://gist.github.com/ddengster/a77c7bf9ade79ceb51182c6296c053ae
From examining the gists it seems like there is one thread upon a plugin added event (
GuiRunner::OnPluginAdded
) creating a bunch of rendering entities plus accessing components, and the other thread is receiving aignition::msgs::SerializedStepMap
message and trying to rebuild views (clearing components map of the view). I've made sure both are not run in parallel in this PR and it seems to fix the issues (though my scene seems to be loading slower).I've tried to push
RebuildViews()
to a delayed step similar to processing requests in theGuiRunner::OnState
function but it seems that there are some dependencies on the results of that function, so I had to revert to this solution. It may be wise to re-think how threads access the ECM so as to prevent a whole lot of lockspam in the future.