-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3 streaming upload allocates 5-10x more mem than the multipart chunk size (oom risk) and fails to saturate net tx (inefficient) #2390
Comments
Thanks for the deep dive here. This behavior seems to be as expected. Here is why it is expected:
So for the case when you are setting the expected file size to 10,000 GB, the expected chunksize as you noted is 1GB due to the 10,000 part limit. However since there is 10 threads and each thread has to store the chunk/part into memory, that will increase max memory usage by 10x. Based on the explanation, the best way to avoid the OOM errors would be to either decrease the thread count or use a file instead of stdin as the source of your data. I am not sure if there is much we can do architecturally in the CLI given the nature of unrewindable streams and the constraints of S3 API to improve this performance. Let us know if that makes sense. It might be worth improving our documentation to make this more clear. |
Ah ok, 10 threads explains the mem usage—thanks for the explanation. (Docs on controlling thread count, for reference.) But what about streaming |
Hello, I'm running into issues with aws s3 cp running out of memory, so I thought I would reply to this thread since it looks pertinent.
Since I don't have much memory on the machine, I set up the following configuration in ~/.aws/config There should be only one thread running, so I anticipate around 100M of memory usage. Am I misunderstanding the configuration effect? |
If this is still occurring with a more recent version of the AWS CLI, please let us know. Thanks! |
Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one. |
I'm trying to stream a large (>5GB) byte stream from stdin to s3 using:
but I keep running into OOM kills (and also inefficient net tx) when I use large values for
--expected-size
. I did a little digging into resource usage usingdstat
andpv
and I'm observing that:aws s3 cp
allocates 7-10x more mem than the multipart upload chunk size, whereas I'd expect it to allocate ~1x chunk size + a small constant overhead (<100MB)aws s3 cp
fails to saturate net tx, alternating between two modes of resource usage:gsutil cp foo s3://foo
achieves ~40-50 MB/s net tx from the same hostHere are sample commands I'm using to observe these, along with the resource usage I observed for each:
Sample output:
I'm running awscli 1.11.36 on k8s on aws ec2:
Two other (old, closed) issues that sound possibly related:
The text was updated successfully, but these errors were encountered: