Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent deadlock when notifying shard of ledger #3683

Closed
wants to merge 1 commit into from

Conversation

miguelportilla
Copy link
Contributor

High Level Overview of Change

This change is primarily about preventing a potential thread deadlock when notifying shards on ledger completions.

Context of Change

Shards are notified by the inbound ledger process when pertaining ledgers have been acquired. Upon notification, the SQLite database is populated with ledger data. If the ledger data resides in the same shard, it is possible to encounter a deadlock when shard fetch calls are made while storing to the SQLite database.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • Refactor (non-breaking change that only restructures code)

Copy link
Collaborator

@seelabs seelabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I had one question about who closes the backend_ in one case, and left two nits. Other than that, nice solution to the deadlock issue!

src/ripple/nodestore/impl/Shard.h Outdated Show resolved Hide resolved
src/ripple/nodestore/impl/Shard.cpp Show resolved Hide resolved
{
JLOG(j_.error()) << "shard " << index_
<< " missing acquire SQLite database";
auto const scopedCount{makeBackendCount()};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 153 of this file (not in this changeset), we check the backendCount_ and exit early if in use. We also lock mutex_ but not storedMutex_ there. The old code would have closed the backend_ but the new code may not.

Who closes backend_ in such a case? Is it DatabaseShardImp::sweep()? If nobody does, an easy fix is to lock the storedMutex there as well. If it's not an issue, the of couse fine as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, DatabaseShardImp::sweep() will close it or the shard Dtor. I don't see an issue with tryClose and storedMutex_ as is, but maybe I missed something you saw.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compiled these changes together with Deterministic shards PR and run it on mainnet. It successfully collecting shards.

Copy link
Collaborator

@seelabs seelabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@miguelportilla miguelportilla added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants