Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDbAsyncClient - Exception on connection reset #660

Closed
johansnow opened this issue Aug 17, 2018 · 4 comments
Closed

DynamoDbAsyncClient - Exception on connection reset #660

johansnow opened this issue Aug 17, 2018 · 4 comments
Labels
bug This issue is a bug. Netty NIO Client pending-release This issue will be fixed by an approved PR that hasn't been released yet.

Comments

@johansnow
Copy link

When using DynamoDBAsyncClient, we intermittently get the following exception, without any resulting problems in our application layer.

2018-08-17 08:49:18.862 WARN  1 [aws-java-sdk-NettyEventLoop-2-1] --- [userId: eventId:] i.n.c.DefaultChannelPipeline : An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. java.io.IOException: Connection reset by peer
	at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
	at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:283)
	at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:236)
	at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:226)
	at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:382)
	at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1108)
	at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:345)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:546)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:500)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
	at java.base/java.lang.Thread.run(Thread.java:844)

Given that we are using the DynamoDbAsyncClient, and this problem has already been reported as an issue for the SQS Client (see #522), the actual problem seems to be in the underlying implementation.

Expected Behavior

This exception should not happen because the connection was reset.

Current Behavior

The exception is there.

Possible Solution

This is just the first idea I had, and I have not looked at layers below AmazonAsyncHttpClient, but given that the exception we get matches what happens in the following scenario

  1. the channel is closed
  2. an event is raised on the channel context (all handlers are removed on channel close)
    one possibility is that some part of the RequestPipeline completes its execution after the connection has been reset, such as scheduled future execution (AsyncRetryableStage), or async execution of UnwrapResponseContainer / AfterExecutionInterceptorStage)

Steps to Reproduce (for bugs)

Execute enough DynamoDbAsyncClient ::putItem, ::getItem, ::updateItem requests and the problem should surface. It is entirely possible that this has to be done on a system under load, undergoing scaling etc, rather than simply having a simple test setup with one client sending requests.

Context

It's never nice to have exceptions that show up from time to time. Just because they have not been an actual source of problem so far, doesn't mean they won't be in the future. So far, this issue does not actually seem to have an actual impact on our Application, but given its intermittent nature, there is a risk that it would cause infrequent error that are time consuming and hard to track down. Other than that, it's mainly annoying having issues show up in our logs, and we'd rather not filter at this level.

Your Environment

Sdk openjdk:10-jdk-slim2 preview 10
Debian Buster

@dagnir dagnir added bug This issue is a bug. Netty NIO Client labels Aug 20, 2018
@mspiegle
Copy link

mspiegle commented Nov 13, 2018

I am seeing this too. I performed a BatchWriteItem with 25 items and got a packet capture via tcpdump. As you can see, the SDK opened 2 TCP connections to send the requests, and AWS eventually closed the connections via a reset packet (which corresponded perfectly with the exception I received). It looks like the SDK tries to keep connections open for later re-use, but there's probably some timeout in AWS to prevent idle persistent connections from hanging around too long on their servers. I see this on my local workstation AND when running in AWS Lambda. Strangely, there were many times where AWS did close the connection via reset, and I didn't get the associated exception, so maybe there is a bug?

15:19:40.963556 IP 10.70.24.3.52470 > 52.119.232.62.443: Flags [S], seq 1337719888, win 65535, options [mss 1366,nop,wscale 6,nop,nop,TS val 708461465 ecr 0,sackOK,eol], length 0
15:19:41.035145 IP 52.119.232.62.443 > 10.70.24.3.52470: Flags [S.], seq 4210689071, ack 1337719889, win 8190, options [mss 1360,nop,wscale 6,nop,nop,sackOK], length 0
15:19:41.035270 IP 10.70.24.3.52470 > 52.119.232.62.443: Flags [.], ack 1, win 4096, length 0
15:19:41.047243 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [S], seq 2496625306, win 65535, options [mss 1366,nop,wscale 6,nop,nop,TS val 708461547 ecr 0,sackOK,eol], length 0
15:19:41.060267 IP 10.70.24.3.52470 > 52.119.232.62.443: Flags [P.], seq 1:132, ack 1, win 4096, length 131
15:19:41.105941 IP 52.119.232.62.443 > 10.70.24.3.52470: Flags [.], ack 1, win 232, length 0
15:19:41.119496 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [S.], seq 991235395, ack 2496625307, win 8190, options [mss 1360,nop,wscale 6,nop,nop,sackOK], length 0
15:19:41.119595 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [.], ack 1, win 4096, length 0
15:19:41.124524 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [P.], seq 1:132, ack 1, win 4096, length 131
15:19:41.132604 IP 52.119.232.62.443 > 10.70.24.3.52470: Flags [.], ack 132, win 248, length 0
15:19:41.136690 IP 52.119.232.62.443 > 10.70.24.3.52470: Flags [.], seq 1:1361, ack 132, win 248, length 1360
15:19:41.136764 IP 52.119.232.62.443 > 10.70.24.3.52470: Flags [.], seq 1361:2721, ack 132, win 248, length 1360
15:19:41.136793 IP 10.70.24.3.52470 > 52.119.232.62.443: Flags [.], ack 2721, win 4053, length 0
15:19:41.137043 IP 52.119.232.62.443 > 10.70.24.3.52470: Flags [.], seq 2721:4081, ack 132, win 248, length 1360
15:19:41.137136 IP 52.119.232.62.443 > 10.70.24.3.52470: Flags [P.], seq 4081:5339, ack 132, win 248, length 1258
15:19:41.137159 IP 10.70.24.3.52470 > 52.119.232.62.443: Flags [.], ack 5339, win 4012, length 0
15:19:41.139771 IP 10.70.24.3.52470 > 52.119.232.62.443: Flags [.], ack 5339, win 4096, length 0
15:19:41.182329 IP 10.70.24.3.52470 > 52.119.232.62.443: Flags [P.], seq 132:258, ack 5339, win 4096, length 126
15:19:41.220531 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [.], ack 132, win 171, length 0
15:19:41.220798 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [.], ack 132, win 232, length 0
15:19:41.220953 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [.], ack 132, win 248, length 0
15:19:41.223523 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [.], seq 1:1361, ack 132, win 248, length 1360
15:19:41.223673 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [.], seq 1361:2721, ack 132, win 248, length 1360
15:19:41.223743 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [.], ack 2721, win 4053, length 0
15:19:41.223926 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [.], seq 2721:4081, ack 132, win 248, length 1360
15:19:41.224218 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [P.], seq 4081:5339, ack 132, win 248, length 1258
15:19:41.224256 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [.], ack 5339, win 4028, length 0
15:19:41.224653 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [.], ack 5339, win 4096, length 0
15:19:41.233940 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [P.], seq 132:258, ack 5339, win 4096, length 126
15:19:41.256049 IP 52.119.232.62.443 > 10.70.24.3.52470: Flags [P.], seq 5339:5390, ack 258, win 248, length 51
15:19:41.256173 IP 10.70.24.3.52470 > 52.119.232.62.443: Flags [.], ack 5390, win 4095, length 0
15:19:41.305734 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [P.], seq 5339:5390, ack 258, win 248, length 51
15:19:41.305849 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [.], ack 5390, win 4095, length 0
15:19:41.309565 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [.], seq 258:1618, ack 5390, win 4096, length 1360
15:19:41.309672 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [P.], seq 1618:2159, ack 5390, win 4096, length 541
15:19:41.381295 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [.], ack 2159, win 336, length 0
15:19:41.407633 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [P.], seq 5390:5697, ack 2159, win 336, length 307
15:19:41.407749 IP 10.70.24.3.52471 > 52.119.232.62.443: Flags [.], ack 5697, win 4091, length 0
15:20:28.119218 IP 52.119.232.62.443 > 10.70.24.3.52470: Flags [R.], seq 5390, ack 258, win 9300, length 0
15:20:29.744864 IP 52.119.232.62.443 > 10.70.24.3.52471: Flags [R.], seq 5697, ack 2159, win 9300, length 0

@zoewangg
Copy link
Contributor

Thank you for reporting. This has been fixed in #834. Basically, the exception will be retried by the SDK and we should not be letting exceptions get past the end of the pipeline.

@zoewangg
Copy link
Contributor

The fix has been released as part of 2.1.0. Closing the issue, feel free to reopen if you continue to see the issue.

@Eyal-Shalev
Copy link

@zoewangg I'm experiencing a very similar bug in #900 (using version 2.1.3)

@justnance justnance added pending-release This issue will be fixed by an approved PR that hasn't been released yet. and removed awaiting-release labels Apr 19, 2019
aws-sdk-java-automation added a commit that referenced this issue Nov 6, 2019
…d2fe4fa8

Pull request: release <- staging/128edac6-45eb-43b4-921f-ac7dd2fe4fa8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. Netty NIO Client pending-release This issue will be fixed by an approved PR that hasn't been released yet.
Projects
None yet
Development

No branches or pull requests

6 participants