-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle context cancellation properly #428
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a previous fix #391, attempted to address a lock that occurred on context cancel. however in doing so, it introduced a new lock. essentially, if a message was not sent to the requestmanager/responsemanager go routine, waiting for a response to that message could last indefinitely. the previous fix therefore stopped waiting when the calling context cancelled. However once a message reaches the go routine of the requestmanager/responsemanager, it's important that it's processed to completion, so that the the message loop doesn't lock. The proper fix is to detect when the message is sent successfully, and if so, wait for it to be processed. If it isn't sent, we can safely abort the go routine immediately.
hannahhoward
force-pushed
the
fix/requestmanager-lockup
branch
from
August 1, 2023 22:05
8637464
to
5b921b5
Compare
rvagg
approved these changes
Aug 1, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine to me, the removal of the case <-ctx.Done():
in the public API (client) functions being the key.
However the failure of TestBlockHooks/responding_to_extensions
in one of the CIs does ring a few bells because it seems close to this code.
think I found it, unrelated, so double fix! |
hannahhoward
added a commit
to filecoin-project/boost-graphsync
that referenced
this pull request
Aug 15, 2023
* fix(cancellation): handle message cancellation properly a previous fix ipfs#391, attempted to address a lock that occurred on context cancel. however in doing so, it introduced a new lock. essentially, if a message was not sent to the requestmanager/responsemanager go routine, waiting for a response to that message could last indefinitely. the previous fix therefore stopped waiting when the calling context cancelled. However once a message reaches the go routine of the requestmanager/responsemanager, it's important that it's processed to completion, so that the the message loop doesn't lock. The proper fix is to detect when the message is sent successfully, and if so, wait for it to be processed. If it isn't sent, we can safely abort the go routine immediately. * fix(race): resolve race condition with test responses
hannahhoward
added a commit
to filecoin-project/boost-graphsync
that referenced
this pull request
Aug 15, 2023
* fix(cancellation): handle message cancellation properly a previous fix ipfs#391, attempted to address a lock that occurred on context cancel. however in doing so, it introduced a new lock. essentially, if a message was not sent to the requestmanager/responsemanager go routine, waiting for a response to that message could last indefinitely. the previous fix therefore stopped waiting when the calling context cancelled. However once a message reaches the go routine of the requestmanager/responsemanager, it's important that it's processed to completion, so that the the message loop doesn't lock. The proper fix is to detect when the message is sent successfully, and if so, wait for it to be processed. If it isn't sent, we can safely abort the go routine immediately. * fix(race): resolve race condition with test responses
dirkmc
added a commit
to filecoin-project/boost-graphsync
that referenced
this pull request
Aug 23, 2023
Handle context cancellation properly (ipfs#428)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A previous fix #391, attempted to address a lock that occurred when a client facing function was called with a context that was cancelled. However in doing so, this PR introduced a new, potentially more critical lock the request manager/response manager message loop.
The original issue is as follows: if a message was not sent to the requestmanager/responsemanager go routine because the calling context is cancelled, then subsequent code waiting for a response to that message being processed could block indefinitely.
The previous fix therefore stopped waiting for a response when the calling context cancelled.
However once a message reaches the go routine of the requestmanager/responsemanager, it's important that it's processed to completion, so that the the message loop doesn't lock. If we stop waiting for a response, the message loop itself can lock trying to send a response to the message.
The proper fix is to detect when the message is sent to the message loop successfully vs aborted due to context cancellation. If it is sent successfully before the calling context cancels, then we need to wait for it to be processed, even if the calling context cancels while it's processed (this should be a miniscule amount of time). If it isn't sent before the context cancels, we can safely abort the go routine immediately.