Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(@libp2p/tcp): race condition in onSocket #2763

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tabcat
Copy link
Contributor

@tabcat tabcat commented Oct 11, 2024

Title

fix(@libp2p/tcp): race condition in onSocket

Description

This fixes a race condition in the onSocket listener method.
onSocket gets a net.Socket parameter that needs to be closed later
before the tcp server can close.

The net.Socket is used to create a MultiaddrConnection but the
connection is not tracked with this.connections until after it has been
upgraded.

If the tcp listener.close is called while a the MultiaddrConnection is waiting
to be upgraded, then the MultiaddrConnection socket cannot be closed as
it does not exist in this.connections.

Instead of adding the MultiaddrConnection to this.connections on
creation I decided to create a separate property named
this.maConnections. I decided to track the non-upgraded connections
separately because I was unsure how this.connections must be used with other things
like metrics.

Notes & open questions

Closes #2760

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • I have added tests that prove my fix is effective or that my feature works

This fixes a race condition in the onSocket listener method.
onSocket gets a net.Socket parameter that needs to be closed later
before the tcp server can close.
The net.Socket is used to create a MultiaddrConnection but the
connection is not tracked with this.connections until after it has been
upgraded.
If the tcp listener.close is called while a the MultiaddrConnection is waiting
to be upgraded, then the MultiaddrConnection socket cannot be closed as
it does not exist in this.connections.

Instead of adding the MultiaddrConnection to this.connections on
creation I decided to create a separate property named
this.maConnections. I decided to track the non-upgraded connections
separately because I was unsure how this.connections must be used with other things
like metrics.
@tabcat tabcat marked this pull request as ready for review October 11, 2024 12:16
@tabcat tabcat requested a review from a team as a code owner October 11, 2024 12:16
Copy link
Member

@achingbrain achingbrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is necessary. When we close the transport and call this.pause(true) we close the socket server which should destroy all open sockets.

This is the same as calling maConn.abort(err) - the interesting thing it does is call socket.destroy() - see toMultiaddrConnection for this.

It's possible there's something missing though - could you please add a test to show the problem you're seeing?

packages/transport-tcp/src/listener.ts Show resolved Hide resolved
@achingbrain
Copy link
Member

achingbrain commented Oct 11, 2024

Actually server.close says:

Stops the server from accepting new connections and keeps existing connections.

So yes, it is necessary to close un-upgraded connections separately, but we should probably just close them immediately if they fail to upgrade, if we're not doing that already.

@tabcat
Copy link
Contributor Author

tabcat commented Oct 11, 2024

Yes, they are aborted inside of the catch block of the upgrade call currently.

The problem is that maConn is added to this.connections inside of the then block of the upgrade. If close is called before the then block is ran then the connection cannot be closed and this.server.close never resolves.

@tabcat
Copy link
Contributor Author

tabcat commented Oct 11, 2024

It's possible there's something missing though - could you please add a test to show the problem you're seeing?

I can add a test. The test that I have been using exists in https:/tabcat/delay5 which I should be able to replicate in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

race condition causes @libp2p/tcp close method to hang
2 participants