-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent cancelling the HTTP/2 requests will cause client never sends WINDOW_UPDATE frame #3915
Comments
Thanks @jifang for including the github project to reproduce it. Not sure who is going to look at this first. n.b. Have you looked at the Http2 frame logging? Excuse the kotlin
Should give output like
|
Thanks @yschimke for the tips. |
I have a theory. In our HTTP/2 code we don't ack bytes until they're delivered to the application. It's insufficient to get them to the Http2Stream; they actually need to be read from the stream. I'm guessing that when a steam is cancelled we need to do something with the leftover data. |
Nice. That theory should be reasonably easy to repro with a test if so. |
Tagging for 3.11 as this seems hi-pri |
fixed cases when: 1. Client cancels stream and does not update window 2. Client receives RstStream and does not update window square#3915
fixed cases when: 1. Client cancels stream and does not update window for unread data 2. Client receives RstStream and does not update window for unread data 3. Client receives Data for closed (unknown) stream and does not calculate that data into window square#3915
Added test cases when: 1. Client cancels stream and should update window for unread data 2. Client receives Data for closed (unknown) stream and should calculate that data into window square#3915
fixed cases when: 1. Client cancels stream and does not update window for unread data 2. Client receives RstStream and does not update window for unread data 3. Client receives Data for closed (unknown) stream and does not calculate that data into window 4. Client receives more bytes that it can handle and closes stream square#3915
fixed cases when: 1. Client cancels stream and does not update window for unread data 2. Client receives RstStream and does not update window for unread data 3. Client receives Data for closed (unknown) stream and does not calculate that data into window 4. Client receives more bytes that it can handle and closes stream square#3915 not fixed: 1. Push requests 2. One unknown case which causes failure of https:/wowex/okhttp-3915-test
fixed cases when: 1. Client cancels stream and does not update window for unread data 2. Client receives RstStream and does not update window for unread data 3. Client receives Data for closed (unknown) stream and does not calculate that data into window 4. Client receives more bytes that it can handle and closes stream square#3915 not fixed: 1. Push requests
fixed case when: 1. Client closes stream and does not update window for unread data square#3915 Test code: Call call = client.newCall(request); try (Response response = call.execute()) { response.body().source().read(new byte[1]); }
We rely on the application layer to read the response body buffer before sending WINDOW_UPDATE's. Previously we'd immediately throw a StreamResetException. This prevented the application layer from reading the buffer which in turn means we would not send WINDOW_UPDATE's. This has potential to deplete the flow-control window. #3915
We rely on the application layer to read the response body buffer before sending WINDOW_UPDATE's. Previously we'd immediately throw a StreamResetException. This prevented the application layer from reading the buffer which in turn means we would not send WINDOW_UPDATE's. This has potential to deplete the flow-control window. #3915
I have been debugging an issue with ExoPlayer and OkHttp Http/2 connections that I now think may be the same underlying issue as this one. I have a test project attached to the issue that I can get 100% reproduction of the SocketTimeoutException if you run the project on a Genymotion android emulator with the network throttled to GPRS or Edge. Would love to have an idea of when I could grab a snapshot to test a potential fix to this. I tried the latest 3.11.0-SNAPSHOT from today (April 11th). |
Is there any update on this issue? Thanks! |
Also looking for an update, we're still seeing this in production, causing any connection to a domain to fail until the socket gets removed from the connection pool |
Can you try the latest SNAPSHOT? Be sure to fully read the response body, that's our signal to notify the peer we are ready for more bytes. If you're still seeing an issue, can you please provide some steps to reproduce? Or ideally a test case if that's possible. |
@ojw28 do you think it's feasible to have the loaders in exoplayer clean up cancelled requests by finishing reading the bytes? As an aside @dave-r12, I personally don't know if it's reasonable to ask the application layer to do something like that when you don't need to do this with Http/1.1 connections and it doesn't break all subsequent connections to that domain. I will say that this is a huge issue for our video playback on android right now and I'm open to fixing it in the short term by doing a workaround but this is just my initial reaction. If you're looking to reproduce, I have a test project attached to the issue that I can get 100% reproduction of the SocketTimeoutException if you run the project on a Genymotion android emulator with the network throttled to GPRS or Edge. |
bump, can I work with someone from square to figure out how to fix this outside of the application layer? this is a huge issue for us right now |
I don't think it's reasonable to require the application layer to fully read the response body. What if it's a 2GB video or something? |
@natez0r did you try the latest snapshot? If that fixes the problem I can cut a release. |
@swankjesse just tried now with (https://oss.sonatype.org/content/repositories/snapshots/com/squareup/okhttp3/okhttp/3.11.0-SNAPSHOT/okhttp-3.11.0-20180522.074215-92.pom) and unfortunately, I was still reproducing the issue in my test app |
Connections or requests? Assuming you mean requests and the connection flow-control window is full, then yes the server should not send us any more data frames until we free up space.
Should only be what we have buffered so far, not the entire response. There should be no blocking on IO. |
So I think where we are with this is:
Thanks! |
Yes, I believe the commit above does that (pending code review.) |
Awesome, thanks for the fix @dave-r12! much appreciated. If you'd like, I can run my test project on it once it is in a snapshot. |
@natez0r run your test project on latest snapshot? |
I tried it, but i am still seeing a socket timeout. I'm trying to investigate a bit further, but wanted to let you know I'm working on it. |
Shoot, ok. Let us know if we can help out at all. Ideally we can get an executable test case for this and then fix. |
Hey @natez0r let me know if you'd like a second set of hands on this! We're eager to get this resolved. |
I am able to verify that calling |
I got a chance to debug this a bit more today. It looks like my issue can occur when we've 'opened' the connection but have yet to begin reading the response but decide to cancel it. |
@dave-r12 I guess my issue is slightly different than this, but I am trying to figure out what the actual cause could be: If we've opened a connection (gotten the response code), but have yet to start reading the body of the response and decide to cancel the request, does the underlying |
Alright. I'll continue to stare at it but nothing is jumping out at me yet. If you could grab the HTTP/2 frame logs or can come up with an executable test case that will help as well. I'd really like to get to the root cause of this one 😄.
That should already be happening if we haven't received all the data for the stream or there wasn't an existing error code for the stream. |
Hey Dave, I'll try to come up with a test case when I get back from
vacation late next week. Thanks for your help on this!
…On Fri, Jun 29, 2018, 8:52 AM Dave Roberge ***@***.***> wrote:
Alright. I'll continue to stare at it but nothing is jumping out at me
yet. If you could grab the HTTP/2 frame logs or can come up with an
executable test case that will help as well. I'd really like to get to the
root cause of this one 😄.
does the underlying Http2Stream need to write a reset code to the server.
That should already be happening if we haven't received all the data for
the stream or there wasn't an existing error code for the stream.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3915 (comment)>, or mute
the thread
<https:/notifications/unsubscribe-auth/AAlEM-1QbpCAGxzLf1BOb4aLmfu_aHwjks5uBiMpgaJpZM4SfJuX>
.
|
@natez0r Just a headsup about 3.11 getting cut soon. Is the current status, that 3 good fixes have gone in but there is still an edge case we need to reproduce and fix? |
@yschimke that sounds correct to me. The issue I am having could probably be broken off into a separate issue. I just got back from vacation today, I am going to try to write a test case to repro the issue before EOD (fingers crossed) |
If you think it's different enough, then please split it off. |
so, i cannot get it to reproduce yet with the the unit tests, I think it probably can be split off. The issue that I am seeing is: EDGE connection speed open one connection and get the content size, begin reading a chunk of data from that connection (i used 8192 block size). During that read, try to open a new connection to a URL on the same domain. Regardless of if you close the old connection or not, the second connection will fail for me. Anywho, I will file an issue on monday. |
@natez0r can you please share your test code that shows this still happens? |
@noamtamim I was using google/ExoPlayer#4078 (comment) to reproduce it. I haven't used it in awhile, but the basic gist for the test project was to have two connections to the same domain and cancel one mid-request and then try to read the second |
Description: When downloading a lot of medium-sized files from Nginx HTTP/2 server, if requests are frequently canceled (RST_STREAM), eventually the connection will timeout and not be able to send/receive any data from the server.
On the server side configure the Nginx server to support HTTP/2 with a self-signed certificate. And host about 100 files with size about 500k each.
On the client side, I built an Android test, using a single thread to download the file from file no.1 to file no.100 consequently. During each download, the request is canceled and then proceed to download next file. Eventually, the request will timeout. Depending on the setup the timeout may happen on different download. But once the testing environment is set, it always failed on the same file.
Server info: nginx/1.13.10
Client info: Android 6.0 with OKHttp 3.10.0
This issue is reproducible on multiple servers. For example, Go HTTP/2 server and AWS CloudFront.
A filtered (sid:75) server log shows that the header frame has been sent on 14:39:32 and then timed out after 60 seconds. Note, it is not always the header frame though. Sometimes it is the data frame not being sent out.
In the meantime, the client log shows the header is not received until 14:40:32
At first, I thought it is a server issue, so I filed a bug to the Nginx team. They replied that this is because the client never sends the WINDOW_UPDATE frame except after the initial connection.
Server Full Log
Client Full Log
The test project is under [email protected]:jifang/nginx_bug_repo1.git
The text was updated successfully, but these errors were encountered: