Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix another deadlock in sensors system #2141

Merged
merged 2 commits into from
Sep 20, 2023
Merged

Conversation

iche033
Copy link
Contributor

@iche033 iche033 commented Sep 13, 2023

🦟 Bug fix

Summary

This PR:

  • Fixed a situation deadlock that occurs when waiting on the updateAvailable variable with a condition variable while that variable is also being written.
  • Reduced scope of a lock (timeLock). Not needed for fixing deadlock - can be done in a separate PR if needed.
    • Update: Reverted these changes.

More info on the deadlock:

These two operations can happen concurrently (from 2 threads):

        std::unique_lock<std::mutex> cvLock(this->dataPtr->renderMutex);
        // updateAvailable is true here
        this->dataPtr->renderCv.wait(cvLock, [this] {
          return !this->dataPtr->running || !this->dataPtr->updateAvailable; });
        // updateAvailable is set to false
        this->dataPtr->updateAvailable = true;
        this->dataPtr->renderCv.notify_one();

The wait condition may miss the notify_one() call causing it to sleep forever. To solve this, we need to make sure these two groups of operations are performed one after another. This fix is to group and lock the updateAvailable = [true|false] and notify_one calls using the same lock that's used by the wait condition variable.

Test it

Ran into this deadlock when adding the DvL sensor system to the Harmonic demo world. The DVL sensor is implemented as a custom rendering sensor. It changes the timing of the sensors system which revealed this deadlock issue.

Test with this pull request that adds a DVL sensor to the harmonic demo world:
gazebosim/harmonic_demo#9

Without the changes in this PR, the deadlock happens within 1 minute of running the simulation for me.

Checklist

  • Signed all commits for DCO
  • Added tests
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • Consider updating Python bindings (if the library has them)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

@github-actions github-actions bot added the 🎵 harmonic Gazebo Harmonic label Sep 13, 2023
@iche033 iche033 changed the title Fix more deadlock in sensors system Fix another deadlock in sensors system Sep 13, 2023
@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Merging #2141 (da25558) into gz-sim8 (e9fa7d4) will decrease coverage by 0.02%.
Report is 2 commits behind head on gz-sim8.
The diff coverage is 100.00%.

❗ Current head da25558 differs from pull request most recent head b8ba42f. Consider uploading reports for the commit b8ba42f to get more accurate results

@@             Coverage Diff             @@
##           gz-sim8    #2141      +/-   ##
===========================================
- Coverage    65.47%   65.45%   -0.02%     
===========================================
  Files          323      323              
  Lines        30704    30708       +4     
===========================================
- Hits         20102    20101       -1     
- Misses       10602    10607       +5     
Files Changed Coverage Δ
src/systems/sensors/Sensors.cc 63.56% <100.00%> (+0.38%) ⬆️

... and 2 files with indirect coverage changes

@mjcarroll
Copy link
Contributor

This causes SensorsFixture/UpdateRate to fail, which is a bit concerning.

Signed-off-by: Ian Chen <[email protected]>
@iche033
Copy link
Contributor Author

iche033 commented Sep 14, 2023

This causes SensorsFixture/UpdateRate to fail, which is a bit concerning.

I ran CI again and confirmed that it failed again. I reverted the lock scope changes in b8ba42f and that seems to fix it. I ran CI two more times and test passes.

@azeey azeey added the beta Targeting beta release of upcoming collection label Sep 18, 2023
@iche033 iche033 merged commit 509d01a into gz-sim8 Sep 20, 2023
4 of 7 checks passed
@iche033 iche033 deleted the sensors_update_deadlock branch September 20, 2023 00:55
iche033 added a commit that referenced this pull request Oct 10, 2023
iche033 added a commit that referenced this pull request Oct 11, 2023
iche033 added a commit that referenced this pull request Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beta Targeting beta release of upcoming collection 🎵 harmonic Gazebo Harmonic
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants