Frequent cancelling the HTTP/2 requests will cause client never sends WINDOW_UPDATE frame #3915

jifang · 2018-03-06T17:52:03Z

Description: When downloading a lot of medium-sized files from Nginx HTTP/2 server, if requests are frequently canceled (RST_STREAM), eventually the connection will timeout and not be able to send/receive any data from the server.

On the server side configure the Nginx server to support HTTP/2 with a self-signed certificate. And host about 100 files with size about 500k each.
On the client side, I built an Android test, using a single thread to download the file from file no.1 to file no.100 consequently. During each download, the request is canceled and then proceed to download next file. Eventually, the request will timeout. Depending on the setup the timeout may happen on different download. But once the testing environment is set, it always failed on the same file.

Server info: nginx/1.13.10
Client info: Android 6.0 with OKHttp 3.10.0

This issue is reproducible on multiple servers. For example, Go HTTP/2 server and AWS CloudFront.

A filtered (sid:75) server log shows that the header frame has been sent on 14:39:32 and then timed out after 60 seconds. Note, it is not always the header frame though. Sometimes it is the data frame not being sent out.

2018/03/05 14:39:32 [debug] 92282#0: *4 http2 frame type:1 f:5 l:14 sid:75
2018/03/05 14:39:32 [debug] 92282#0: *4 http2 HEADERS frame sid:75 depends on 0 excl:0 weight:16
2018/03/05 14:39:32 [debug] 92282#0: *4 http2 frame out: 00007FD4B082ABA0 sid:75 bl:1 len:122
2018/03/05 14:39:32 [debug] 92282#0: *4 http2 frame sent: 00007FD4B082ABA0 sid:75 bl:1 len:122
2018/03/05 14:40:32 [debug] 92282#0: *4 http2 send RST_STREAM frame sid:75, status:1

In the meantime, the client log shows the header is not received until 14:40:32

03-05 14:39:32.229 5979-5995/? V/VB_Network: Sending request https://10.0.0.186/files/37 on Connection{10.0.0.186:443, proxy=DIRECT hostAddress=/10.0.0.186:443 cipherSuite=TLS_RSA_WITH_AES_128_CBC_SHA protocol=h2}
                                             User-Agent: 
                                             Host: 10.0.0.186
                                             Connection: Keep-Alive
                                             Accept-Encoding: gzip
03-05 14:40:32.397 5979-5995/? V/VB_Network: Received response for https://10.0.0.186/files/37 (h2) in 60167.4ms
                                             server: nginx/1.13.10
                                             date: Mon, 05 Mar 2018 22:39:32 GMT
                                             content-type: application/octet-stream
                                             content-length: 562176
                                             last-modified: Sun, 04 Mar 2018 00:23:13 GMT
                                             etag: "5a9b3c71-89400"
                                             accept-ranges: bytes
03-05 14:40:32.400 5979-5995/? E/VB_Test: stream was reset: PROTOCOL_ERROR https://10.0.0.186/files/37

At first, I thought it is a server issue, so I filed a bug to the Nginx team. They replied that this is because the client never sends the WINDOW_UPDATE frame except after the initial connection.

Server Full Log
Client Full Log

The test project is under [email protected]:jifang/nginx_bug_repo1.git

The text was updated successfully, but these errors were encountered:

yschimke · 2018-03-06T19:07:04Z

Thanks @jifang for including the github project to reproduce it. Not sure who is going to look at this first. n.b. Have you looked at the Http2 frame logging?

Excuse the kotlin

val activeLogger = getLogger(okhttp3.internal.http2.Http2::class.java.name)
activeLogger.level = Level.FINE

Should give output like

>> CONNECTION 505249202a20485454502f322e300d0a0d0a534d0d0a0d0a
>> 0x00000000     6 SETTINGS
>> 0x00000000     4 WINDOW_UPDATE
<< 0x00000000    18 SETTINGS
>> 0x00000000     0 SETTINGS      ACK
>> 0x00000003    65 HEADERS       END_STREAM|END_HEADERS
<< 0x00000000     0 SETTINGS      ACK
<< 0x00000003   209 HEADERS       END_HEADERS
<< 0x00000003 14181 DATA
<< 0x00000003  2829 DATA
<< 0x00000003 16375 DATA
<< 0x00000003  8505 DATA
<< 0x00000003  5667 DATA
<< 0x00000003 16384 DATA
<< 0x00000003 16366 DATA
<< 0x00000003 11343 DATA
<< 0x00000003  5667 DATA
<< 0x00000003  5083 DATA          END_STREAM
>> 0x00000000     8 GOAWAY

jifang · 2018-03-06T22:13:54Z

Thanks @yschimke for the tips.
Here is the client log filtered by Http2
The test project is also updated.

swankjesse · 2018-03-06T22:51:36Z

I have a theory.

In our HTTP/2 code we don't ack bytes until they're delivered to the application. It's insufficient to get them to the Http2Stream; they actually need to be read from the stream.

I'm guessing that when a steam is cancelled we need to do something with the leftover data.

yschimke · 2018-03-07T08:05:44Z

Nice. That theory should be reasonably easy to repro with a test if so.

yschimke · 2018-03-12T06:09:05Z

Tagging for 3.11 as this seems hi-pri

…ream is canceled square#3915

fixed cases when: 1. Client cancels stream and does not update window 2. Client receives RstStream and does not update window square#3915

fixed cases when: 1. Client cancels stream and does not update window for unread data 2. Client receives RstStream and does not update window for unread data 3. Client receives Data for closed (unknown) stream and does not calculate that data into window square#3915

Added test cases when: 1. Client cancels stream and should update window for unread data 2. Client receives Data for closed (unknown) stream and should calculate that data into window square#3915

fixed cases when: 1. Client cancels stream and does not update window for unread data 2. Client receives RstStream and does not update window for unread data 3. Client receives Data for closed (unknown) stream and does not calculate that data into window 4. Client receives more bytes that it can handle and closes stream square#3915

fixed cases when: 1. Client cancels stream and does not update window for unread data 2. Client receives RstStream and does not update window for unread data 3. Client receives Data for closed (unknown) stream and does not calculate that data into window 4. Client receives more bytes that it can handle and closes stream square#3915 not fixed: 1. Push requests 2. One unknown case which causes failure of https:/wowex/okhttp-3915-test

fixed cases when: 1. Client cancels stream and does not update window for unread data 2. Client receives RstStream and does not update window for unread data 3. Client receives Data for closed (unknown) stream and does not calculate that data into window 4. Client receives more bytes that it can handle and closes stream square#3915 not fixed: 1. Push requests

fixed case when: 1. Client closes stream and does not update window for unread data square#3915 Test code: Call call = client.newCall(request); try (Response response = call.execute()) { response.body().source().read(new byte[1]); }

We rely on the application layer to read the response body buffer before sending WINDOW_UPDATE's. Previously we'd immediately throw a StreamResetException. This prevented the application layer from reading the buffer which in turn means we would not send WINDOW_UPDATE's. This has potential to deplete the flow-control window. #3915

natez0r · 2018-04-11T18:47:56Z

I have been debugging an issue with ExoPlayer and OkHttp Http/2 connections that I now think may be the same underlying issue as this one. I have a test project attached to the issue that I can get 100% reproduction of the SocketTimeoutException if you run the project on a Genymotion android emulator with the network throttled to GPRS or Edge.

Would love to have an idea of when I could grab a snapshot to test a potential fix to this. I tried the latest 3.11.0-SNAPSHOT from today (April 11th).

ojw28 · 2018-05-09T23:22:50Z

Is there any update on this issue? Thanks!

natez0r · 2018-05-10T14:35:33Z

Also looking for an update, we're still seeing this in production, causing any connection to a domain to fail until the socket gets removed from the connection pool

dave-r12 · 2018-05-10T15:22:38Z

Can you try the latest SNAPSHOT? Be sure to fully read the response body, that's our signal to notify the peer we are ready for more bytes.

If you're still seeing an issue, can you please provide some steps to reproduce? Or ideally a test case if that's possible.

natez0r · 2018-05-16T14:49:40Z

@ojw28 do you think it's feasible to have the loaders in exoplayer clean up cancelled requests by finishing reading the bytes?

As an aside @dave-r12, I personally don't know if it's reasonable to ask the application layer to do something like that when you don't need to do this with Http/1.1 connections and it doesn't break all subsequent connections to that domain. I will say that this is a huge issue for our video playback on android right now and I'm open to fixing it in the short term by doing a workaround but this is just my initial reaction.

If you're looking to reproduce, I have a test project attached to the issue that I can get 100% reproduction of the SocketTimeoutException if you run the project on a Genymotion android emulator with the network throttled to GPRS or Edge.

natez0r · 2018-05-21T20:58:53Z

bump, can I work with someone from square to figure out how to fix this outside of the application layer? this is a huge issue for us right now

ojw28 · 2018-05-22T00:48:14Z

I don't think it's reasonable to require the application layer to fully read the response body. What if it's a 2GB video or something?

swankjesse · 2018-05-22T01:42:30Z

@natez0r did you try the latest snapshot? If that fixes the problem I can cut a release.

natez0r · 2018-05-22T17:41:51Z

@swankjesse just tried now with (https://oss.sonatype.org/content/repositories/snapshots/com/squareup/okhttp3/okhttp/3.11.0-SNAPSHOT/okhttp-3.11.0-20180522.074215-92.pom) and unfortunately, I was still reproducing the issue in my test app

dave-r12 · 2018-05-26T18:25:10Z

does that mean that all other connections to that domain will have to wait for the exhausting of the response to complete before they could be multiplexed over the same socket?

Connections or requests? Assuming you mean requests and the connection flow-control window is full, then yes the server should not send us any more data frames until we free up space.

Do we only have to read whatever's actually been received, or does that mean we need to read the entire response the server was planning to serve up?

Should only be what we have buffered so far, not the entire response. There should be no blocking on IO.

#3915

ojw28 · 2018-05-29T14:41:57Z

So I think where we are with this is:

The application should call response.body().close() (we already do this in ExoPlayer).
OkHttp should do whatever is necessary when that happens (e.g. read and discard any buffered-but-not-yet-consumed data). @dave-r12 - Does the commit referenced above do this, or is there more work to be done?

Thanks!

dave-r12 · 2018-05-29T16:22:59Z

Yes, I believe the commit above does that (pending code review.)

natez0r · 2018-05-29T16:56:44Z

Awesome, thanks for the fix @dave-r12! much appreciated. If you'd like, I can run my test project on it once it is in a snapshot.

#3915

dave-r12 · 2018-05-30T12:28:01Z

@natez0r run your test project on latest snapshot?

natez0r · 2018-05-31T18:30:32Z

I tried it, but i am still seeing a socket timeout. I'm trying to investigate a bit further, but wanted to let you know I'm working on it.

dave-r12 · 2018-06-05T20:55:54Z

Shoot, ok. Let us know if we can help out at all. Ideally we can get an executable test case for this and then fix.

baconz · 2018-06-23T17:31:07Z

Hey @natez0r let me know if you'd like a second set of hands on this! We're eager to get this resolved.

jifang · 2018-06-23T22:49:18Z

I am able to verify that calling response.body().close(); had fixed this issue with the latest snapshot. Thanks!

natez0r · 2018-06-28T18:19:08Z

I got a chance to debug this a bit more today. It looks like my issue can occur when we've 'opened' the connection but have yet to begin reading the response but decide to cancel it.

natez0r · 2018-06-28T18:31:19Z

@dave-r12 I guess my issue is slightly different than this, but I am trying to figure out what the actual cause could be:

If we've opened a connection (gotten the response code), but have yet to start reading the body of the response and decide to cancel the request, does the underlying Http2Stream need to write a reset code to the server. I connected the debugger and it appears that the unacknowledgedBytes are 0 when closing the stream, which does not fire a windowUpdate (which sounds reasonable), but I still get sockettimeouts when attempting to make a new connection to the same domain after i've closed the response body of my connection that is in the state described in my first message

dave-r12 · 2018-06-29T12:52:07Z

Alright. I'll continue to stare at it but nothing is jumping out at me yet. If you could grab the HTTP/2 frame logs or can come up with an executable test case that will help as well. I'd really like to get to the root cause of this one 😄.

does the underlying Http2Stream need to write a reset code to the server.

That should already be happening if we haven't received all the data for the stream or there wasn't an existing error code for the stream.

natez0r · 2018-06-29T13:07:54Z

Hey Dave, I'll try to come up with a test case when I get back from vacation late next week. Thanks for your help on this!

…

On Fri, Jun 29, 2018, 8:52 AM Dave Roberge ***@***.***> wrote: Alright. I'll continue to stare at it but nothing is jumping out at me yet. If you could grab the HTTP/2 frame logs or can come up with an executable test case that will help as well. I'd really like to get to the root cause of this one 😄. does the underlying Http2Stream need to write a reset code to the server. That should already be happening if we haven't received all the data for the stream or there wasn't an existing error code for the stream. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3915 (comment)>, or mute the thread <https:/notifications/unsubscribe-auth/AAlEM-1QbpCAGxzLf1BOb4aLmfu_aHwjks5uBiMpgaJpZM4SfJuX> .

yschimke · 2018-07-06T07:27:56Z

@natez0r Just a headsup about 3.11 getting cut soon. Is the current status, that 3 good fixes have gone in but there is still an edge case we need to reproduce and fix?

natez0r · 2018-07-06T13:29:34Z

@yschimke that sounds correct to me. The issue I am having could probably be broken off into a separate issue.

I just got back from vacation today, I am going to try to write a test case to repro the issue before EOD (fingers crossed)

yschimke · 2018-07-06T14:14:35Z

If you think it's different enough, then please split it off.

natez0r · 2018-07-06T20:42:18Z

so, i cannot get it to reproduce yet with the the unit tests, I think it probably can be split off.

The issue that I am seeing is:

EDGE connection speed

open one connection and get the content size, begin reading a chunk of data from that connection (i used 8192 block size).

During that read, try to open a new connection to a URL on the same domain. Regardless of if you close the old connection or not, the second connection will fail for me.

Anywho, I will file an issue on monday.

dave-r12 · 2018-07-07T16:07:10Z

@natez0r I tried to write a test case to match what you found: #4127. What do you think?

noamtamim · 2019-01-07T06:26:58Z

@natez0r can you please share your test code that shows this still happens?

natez0r · 2019-01-07T18:56:19Z

@natez0r can you please share your test code that shows this still happens?

@noamtamim I was using google/ExoPlayer#4078 (comment) to reproduce it. I haven't used it in awhile, but the basic gist for the test project was to have two connections to the same domain and cancel one mid-request and then try to read the second

jifang changed the title ~~Frequent cancelling the HTTP/2 requests will cause server stop sending more data on the same connection~~ Frequent cancelling the HTTP/2 requests will cause client never sends WINDOW_UPDATE frame Mar 6, 2018

yschimke added the bug Bug in existing code label Mar 12, 2018

yschimke added this to the 3.11 milestone Mar 12, 2018

dave-r12 mentioned this issue Mar 12, 2018

Do not alter connection flow-control window on SETTINGS frame. #3920

Merged

wowex pushed a commit to wowex/okhttp that referenced this issue Mar 14, 2018

Fixed issue 3915 when client did not update receive window in case st…

acaa1e5

…ream is canceled square#3915

wowex pushed a commit to wowex/okhttp that referenced this issue Mar 15, 2018

Issue 3915

80191b5

fixed cases when: 1. Client cancels stream and does not update window 2. Client receives RstStream and does not update window square#3915

wowex pushed a commit to wowex/okhttp that referenced this issue Mar 18, 2018

Issue 3915

f3f3468

Added test cases when: 1. Client cancels stream and should update window for unread data 2. Client receives Data for closed (unknown) stream and should calculate that data into window square#3915

dave-r12 mentioned this issue Apr 7, 2018

Defer StreamResetException until response body buffer is fully read. #3960

Merged

natez0r mentioned this issue Apr 11, 2018

Seeking ExoPlayer Rapidly with OkHttp with http2 connection causes sockettimeoutexception google/ExoPlayer#4078

Closed

dave-r12 added a commit that referenced this issue May 27, 2018

Ensure discarded HTTP/2 data frames free flow-control space.

261c088

#3915

dave-r12 mentioned this issue May 27, 2018

Ensure discarded HTTP/2 data frames free flow-control space. #4033

Merged

swankjesse pushed a commit that referenced this issue May 30, 2018

Ensure discarded HTTP/2 data frames free flow-control space. (#4033)

f9cef65

#3915

swankjesse pushed a commit that referenced this issue May 30, 2018

Ensure discarded HTTP/2 data frames free flow-control space. (#4033)

fb83491

#3915

swankjesse modified the milestones: 3.12, 3.11 Jul 5, 2018

yschimke mentioned this issue Jul 7, 2018

Request cancellation requires consuming remaining bytes for HTTP/2 connections #4126

Closed

yschimke closed this as completed Jul 7, 2018

dave-r12 mentioned this issue Jul 7, 2018

Confirm concurrent HTTP/2 requests with empty flow-control window. #4127

Merged

ddudek mentioned this issue Nov 7, 2018

[Android, ExoPlayer] Infinite buffering after cancelling okhttp connections when using HTTP/2.0 (e.g. AWS's S3 + CloudFare) TheWidlarzGroup/react-native-video#1315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frequent cancelling the HTTP/2 requests will cause client never sends WINDOW_UPDATE frame #3915

Frequent cancelling the HTTP/2 requests will cause client never sends WINDOW_UPDATE frame #3915

jifang commented Mar 6, 2018

yschimke commented Mar 6, 2018

jifang commented Mar 6, 2018

swankjesse commented Mar 6, 2018

yschimke commented Mar 7, 2018

yschimke commented Mar 12, 2018

natez0r commented Apr 11, 2018

ojw28 commented May 9, 2018

natez0r commented May 10, 2018

dave-r12 commented May 10, 2018

natez0r commented May 16, 2018

natez0r commented May 21, 2018

ojw28 commented May 22, 2018

swankjesse commented May 22, 2018

natez0r commented May 22, 2018

dave-r12 commented May 26, 2018

ojw28 commented May 29, 2018

dave-r12 commented May 29, 2018

natez0r commented May 29, 2018

dave-r12 commented May 30, 2018

natez0r commented May 31, 2018

dave-r12 commented Jun 5, 2018

baconz commented Jun 23, 2018

jifang commented Jun 23, 2018

natez0r commented Jun 28, 2018

natez0r commented Jun 28, 2018

dave-r12 commented Jun 29, 2018

natez0r commented Jun 29, 2018 via email

yschimke commented Jul 6, 2018

natez0r commented Jul 6, 2018

yschimke commented Jul 6, 2018

natez0r commented Jul 6, 2018

dave-r12 commented Jul 7, 2018

noamtamim commented Jan 7, 2019

natez0r commented Jan 7, 2019

Frequent cancelling the HTTP/2 requests will cause client never sends WINDOW_UPDATE frame #3915

Frequent cancelling the HTTP/2 requests will cause client never sends WINDOW_UPDATE frame #3915

Comments

jifang commented Mar 6, 2018

yschimke commented Mar 6, 2018

jifang commented Mar 6, 2018

swankjesse commented Mar 6, 2018

yschimke commented Mar 7, 2018

yschimke commented Mar 12, 2018

natez0r commented Apr 11, 2018

ojw28 commented May 9, 2018

natez0r commented May 10, 2018

dave-r12 commented May 10, 2018

natez0r commented May 16, 2018

natez0r commented May 21, 2018

ojw28 commented May 22, 2018

swankjesse commented May 22, 2018

natez0r commented May 22, 2018

dave-r12 commented May 26, 2018

ojw28 commented May 29, 2018

dave-r12 commented May 29, 2018

natez0r commented May 29, 2018

dave-r12 commented May 30, 2018

natez0r commented May 31, 2018

dave-r12 commented Jun 5, 2018

baconz commented Jun 23, 2018

jifang commented Jun 23, 2018

natez0r commented Jun 28, 2018

natez0r commented Jun 28, 2018

dave-r12 commented Jun 29, 2018

natez0r commented Jun 29, 2018 via email

yschimke commented Jul 6, 2018

natez0r commented Jul 6, 2018

yschimke commented Jul 6, 2018

natez0r commented Jul 6, 2018

dave-r12 commented Jul 7, 2018

noamtamim commented Jan 7, 2019

natez0r commented Jan 7, 2019