Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle context cancellation properly #428

Merged
merged 2 commits into from
Aug 2, 2023
Merged

Conversation

hannahhoward
Copy link
Collaborator

A previous fix #391, attempted to address a lock that occurred when a client facing function was called with a context that was cancelled. However in doing so, this PR introduced a new, potentially more critical lock the request manager/response manager message loop.

The original issue is as follows: if a message was not sent to the requestmanager/responsemanager go routine because the calling context is cancelled, then subsequent code waiting for a response to that message being processed could block indefinitely.

The previous fix therefore stopped waiting for a response when the calling context cancelled.

However once a message reaches the go routine of the requestmanager/responsemanager, it's important that it's processed to completion, so that the the message loop doesn't lock. If we stop waiting for a response, the message loop itself can lock trying to send a response to the message.

The proper fix is to detect when the message is sent to the message loop successfully vs aborted due to context cancellation. If it is sent successfully before the calling context cancels, then we need to wait for it to be processed, even if the calling context cancels while it's processed (this should be a miniscule amount of time). If it isn't sent before the context cancels, we can safely abort the go routine immediately.

a previous fix #391, attempted to address a lock that occurred on context cancel. however in doing
so, it introduced a new lock. essentially, if a message was not sent to the
requestmanager/responsemanager go routine, waiting for a response to that message could last
indefinitely. the previous fix therefore stopped waiting when the calling context cancelled. However
once a message reaches the go routine of the requestmanager/responsemanager, it's important that
it's processed to completion, so that the the message loop doesn't lock. The proper fix is to detect
when the message is sent successfully, and if so, wait for it to be processed. If it isn't sent, we
can safely abort the go routine immediately.
Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me, the removal of the case <-ctx.Done(): in the public API (client) functions being the key.

However the failure of TestBlockHooks/responding_to_extensions in one of the CIs does ring a few bells because it seems close to this code.

@hannahhoward
Copy link
Collaborator Author

think I found it, unrelated, so double fix!

@hannahhoward hannahhoward merged commit cb19a45 into main Aug 2, 2023
18 checks passed
hannahhoward added a commit to filecoin-project/boost-graphsync that referenced this pull request Aug 15, 2023
* fix(cancellation): handle message cancellation properly

a previous fix ipfs#391, attempted to address a lock that occurred on context cancel. however in doing
so, it introduced a new lock. essentially, if a message was not sent to the
requestmanager/responsemanager go routine, waiting for a response to that message could last
indefinitely. the previous fix therefore stopped waiting when the calling context cancelled. However
once a message reaches the go routine of the requestmanager/responsemanager, it's important that
it's processed to completion, so that the the message loop doesn't lock. The proper fix is to detect
when the message is sent successfully, and if so, wait for it to be processed. If it isn't sent, we
can safely abort the go routine immediately.

* fix(race): resolve race condition with test responses
hannahhoward added a commit to filecoin-project/boost-graphsync that referenced this pull request Aug 15, 2023
* fix(cancellation): handle message cancellation properly

a previous fix ipfs#391, attempted to address a lock that occurred on context cancel. however in doing
so, it introduced a new lock. essentially, if a message was not sent to the
requestmanager/responsemanager go routine, waiting for a response to that message could last
indefinitely. the previous fix therefore stopped waiting when the calling context cancelled. However
once a message reaches the go routine of the requestmanager/responsemanager, it's important that
it's processed to completion, so that the the message loop doesn't lock. The proper fix is to detect
when the message is sent successfully, and if so, wait for it to be processed. If it isn't sent, we
can safely abort the go routine immediately.

* fix(race): resolve race condition with test responses
dirkmc added a commit to filecoin-project/boost-graphsync that referenced this pull request Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants