Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aiohttp client corrupts uploads that use chunked encoding due to race condition in socket reuse #3281

Closed
Caligatio opened this issue Sep 21, 2018 · 8 comments
Labels
bug client needs-info Issue is lacking sufficient information and will be closed if not provided Stale

Comments

@Caligatio
Copy link

Caligatio commented Sep 21, 2018

Long story short

It appears that the aiohttp client will corrupt chunked uploads while under stress due to a race condition between an upload finishing and reusing that TCP socket

Expected behaviour

Chunked uploads should work without issues

Actual behaviour

The problem manifested by my HTTP server complaining that the first line of a HTTP request (e.g. PUT /whatever) was an incorrect HTTP chunk size. Upon further investigation with Wireshark, a normal chunked HTTP upload randomly got a new HTTP request inserted mid stream. The server therefore expected to read the chunk size but instead encountered an HTTP header.

I dug through the code and I believe the issue is the following:

  • The ClientSession starts the upload in client.py#L389
  • During an upload, the writer that actually writes to the client socket gets "backgrounded" as a Task in client_reqrep.py#L549
  • While the upload Task is still running, ClientSession tries to read the HTTP status code in client.py#L391. The bug crops up when the server responds with a HTTP status code before the upload finishes. My specific usage here was my server would return an immediate HTTP 409 when the client tried to upload a multi-megabyte file to a file location that already had content.
  • The Task gets passed to ClientResponse in client_reqrep.py#L553
  • It appears the only thing that ever awaits _writer is wait_for_close() which is never called
  • My theory at this point is then the underlying TCP socket is passed off to be reused in a new request while the chunked upload is still happening in the background. The new HTTP request therefore interrupts the chunked upload as described previously.

My workaround is currently to have the server unnecessarily read the whole request before sending back the HTTP status code.

Steps to reproduce

On the server side, simply start an aiohttp HTTP server with 0 routes added. Create a client that attempts to upload several thousand files at once using a HTTP POST or PUT using chunked encoding.

Your environment

CentOS 7 with manually compiled Python 3.7.0 and aiohttp 3.4.4 (client).

@asvetlov
Copy link
Member

GitMate.io thinks possibly related issues are #2643 (Aiohttp Client Unclosed session: using with), #3251 (Aiohttp Client request limit), #58 (aiohttp.HttpClient), #1293 (Use timeout in client), and #3175 (aiohttp client timeout does not work).

@asvetlov asvetlov added the bug label Sep 21, 2018
@asvetlov
Copy link
Member

Looks like a real bug.
Would you prepare a PR for the fix?
Cancellation of request payload writer task can help I believe.

@Caligatio
Copy link
Author

It looks like ClientRequest has a terminate which is never used but might be applicable here. The question is when would it be appropriate to call it? As soon as you get something from the ClientResponse's start?

@asvetlov
Copy link
Member

asvetlov commented Sep 21, 2018

I think the task should be terminated on error or return to the connection pool (close()/release() methods).

Consider the following scenario:

  1. Client initiates a connection and sends request headers
  2. Server checks received headers (auth for example) and returns 200 OK.
  3. Client keeps the connection open and sends request body chunk by chunk.
  4. Server receives request chunks and sends response chunks back.

@Caligatio
Copy link
Author

I started looking at what's happening. It appears as though cancel is called on the writer task but it's either not cancelling or the actual cancel is delayed.

Regardless, do you know how to properly cancel a chunked upload? Normally a 0 length chunk indicates the upload is complete. Do we need to kill off the chunk upload and insert a 0-length chunk so the server knows to "release" the socket for another request?

@Caligatio
Copy link
Author

I may be over-engineering a solution to this problem but, assuming we need to send a 0-length chunk to close out the connection, I think 5 functions are going to have to change from being synchronous to asynchronous. The gist is that nothing awaits the cancellation of ClientResponse._writer in ClientResponse._cleanup_writer. If we make ClientResponse._cleanup_writer async to wait for a positive ACK on the cancellation, that means that ClientResponse.release, ClientResponse.close, ClientResponse._response_eof, and thus StreamReader.on_eof all have to go async.

Are you willing to support changing the 3 public functions to async?

@clayg
Copy link

clayg commented Nov 28, 2022

I don't think you would want the client to automatically send the 0-length chunk on "cancel" - some servers might interpret it as a successfully completed upload. In the error case, a server has to read the body of the http request if it wants to re-use the connection, but because of the vaguaries of clients it's best to throw the connection away.

Is this bug still blocked? If a transfer was cancelled pre-maturely (i.e. expect 100 continue returns error) and the http contract is broken aiohttp shouldn't put the connection back into the pool. Cancelling the writer task sounds reaosnable. @asvetlov what would that look like?

@Dreamsorcerer
Copy link
Member

I have a feeling this may have been fixed. If not, someone needs to provide a reproducer/test.

@Dreamsorcerer Dreamsorcerer added the needs-info Issue is lacking sufficient information and will be closed if not provided label Aug 8, 2024
@github-actions github-actions bot added the Stale label Sep 8, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug client needs-info Issue is lacking sufficient information and will be closed if not provided Stale
Projects
None yet
Development

No branches or pull requests

4 participants