Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lwip_sock_tcp / sock_async: received events before calling sock_accept() are lost due to race condition. #16303

Open
iosabi opened this issue Apr 10, 2021 · 3 comments
Assignees
Labels
Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@iosabi
Copy link
Contributor

iosabi commented Apr 10, 2021

Description

In the tests/lwip application, tcp.c defines a TCP server that receives a connection, dumps all the bytes on the terminal as hex and prints messages when the connection is accepted and closed.

The normal dynamic of the test would be that _tcp_accept is called when a new connection in the server socket is received which calls sock_tcp_accept() to accept the connection. sock_tcp_accept returns a socket number (file descriptor) to use with tcp_sock_recv() for example, which is only assigned to the underlying connection when sock_tcp_accept().

However, underneath the netconn connection has already been accepted before sock_tcp_accept() is called, the SYN,ACK packet was already sent and the other end may have already sent bytes to this connection. All these events call the _netconn_cb function in pkg/lwip/contrib/sock/tcp/lwip_sock.c but the conn->callback_arg.socket for the given netconn* conn is set to -1 until sock_tcp_accept() is called and a socket value is assigned. The code in _netconn_cb ignores the callback altogether if the socket is not assigned, which means that both the received data and the FIN are ignored if they happen before the application code has a chance to both call sock_tcp_accept() AND register the recv callback with sock_tcp_event_init(_ev_queue).

The easiest way to see this problem is by connecting to the a TCP server in the test/lwip application and sending data immediately, (for example echo hello | nc 1.2.3.4 12341 to a TCP server created with tcp server start 1234). The server should print the connected message, the hex dump of hello and the reset message; but instead it only prints the connected message. See below for a programmatic way to reproduce it.

I'd propose two fix this issue with by computing the flags variable in _netconn_cb even if socket is -1 and even if there's no callback registered, and storing a mask of "pending" flags somewhere in the struct conn* object that would call the tcp callback when registered with sock_tcp_event_init. Otherwise I don't see a way to use the API without potentially missing events.

Steps to reproduce the issue

Add the following test to lwip/tests/01-run.py and to list of tests at the bottom of the file. Run it with:
RIOT_CI_BUILD=1 make QUIET=0 -C tests/lwip flash test

def test_tcpv6_accept_send(board_group, application, env=None):
    """Test that the TCP server receives data right after accept()."""
    if any(b.name != "native" for b in board_group.boards):
        # run test only with native
        print("SKIP_TEST INFO found non-native board")
        return
    env_server = os.environ.copy()
    if env is not None:
        env_server.update(env)
    env_server.update(board_group.boards[1].to_env())
    with pexpect.spawnu(MAKE, ["-C", application, "term"], env=env_server,
                        timeout=DEFAULT_TIMEOUT) as server:
        port = random.randint(0x0000, 0xffff)
        server_ip = get_ipv6_address(server)
        server.logfile = sys.stderr # FIXME: remove this line

        try:
            connect_addr = socket.getaddrinfo(
                "%s%%tapbr0" % server_ip, port)[0][4]
        except socket.gaierror as e:
            print("SKIP_TEST INFO", e)
            return

        server.sendline(u"tcp server start %d" % port)
        # wait for neighbor discovery to be done
        time.sleep(1)

        with socket.socket(socket.AF_INET6) as sock:
            # Connect, send and close the connection altogether before waiting
            # for the messages on the terminal.
            sock.connect(connect_addr)
            sock.send(b'abcd')
            sock.close()
            server.expect(u"TCP client \\[[0-9a-f:]+\\]:[0-9]+ connected")
            server.expect(u"00000000  61  62  63  64")
            server.expect(u"TCP connection to \\[[0-9a-f:]+\\]:[0-9]+ reset")

Expected results

Test passes. Namely, _tcp_recv is called when either sock.send() or sock.close() are called.

Actual results

The example lwip app never calls _tcp_recv so it never prints "00000000 61 62 63 64" on the terminal.

Versions

     Operating System: "Ubuntu" "20.04.2 LTS (Focal Fossa)"
               Kernel: Linux 5.4.0-70-generic x86_64 x86_64
           native gcc: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
@iosabi
Copy link
Contributor Author

iosabi commented Apr 10, 2021

@miri64 FYI

@miri64 miri64 self-assigned this Apr 12, 2021
@miri64
Copy link
Member

miri64 commented Apr 12, 2021

@miri64 FYI

Received. The proposed fix sounds sane so far.

@jeandudey jeandudey added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: network Area: Networking labels May 14, 2021
@MrKevinWeiss MrKevinWeiss added this to the Release 2021.07 milestone Jun 21, 2021
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
@miri64
Copy link
Member

miri64 commented Jul 20, 2021

Received. The proposed fix sounds sane so far.

Are you planning to provide this @iosabi?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

4 participants