Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect MultiPartUpload Chunksize #789

Closed
alex-kharlamov opened this issue Sep 18, 2023 · 13 comments · Fixed by #888
Closed

Incorrect MultiPartUpload Chunksize #789

alex-kharlamov opened this issue Sep 18, 2023 · 13 comments · Fixed by #888

Comments

@alex-kharlamov
Copy link

Description:

I have encountered an issue with the MultiPartUpload functionality in the s3fs library where the chunk size used during upload appears incorrect.

Environment Information:

  • s3fs Version: 2023.9.1
  • Python Version: 3.9
  • Operating System: Ubuntu 22.04

Issue Details:

When using s3fs to upload large files to an S3 bucket on Cloudflare R2, I noticed that the chunk size used for MultiPartUpload needs to be consistent.

Expected Behavior:

I expect s3fs to use the same chunk size for files with the body(b"1" * 5 * 2**30) + b"kek" and b"1" * 5 * 2**30

Steps to Reproduce:

Working example:

import io

with fsspec.open('s3://path', mode='wb') as f:
    buffer = io.BytesIO(b"1" * 5 * 2**30)
    f.write(buffer.getvalue())

This code works perfectly.

But if we change the buffer size to:

with fsspec.open('s3://path', mode='wb') as f:
    buffer = io.BytesIO((b"1" * 5 * 2**30) + b"kek" )
    f.write(buffer.getvalue())

This code gives the error:
ClientError: An error occurred (InvalidPart) when calling the CompleteMultipartUpload operation: All non-trailing parts must have the same length.

Many thanks for considering my request.

@martindurant
Copy link
Member

That is most unfortunate - AWS S3 doesn't have this limitation so long as each chunk is big enough. An S3File could in theory be configured to always send a specific size (in _upload_chunk), and retain the remainder in the buffer, but it would be annoying to code and only get used by niche backends that need it (perhaps only R2).

@plaflamme
Copy link

FWIW: arrow made a fix for this apache/arrow#34363
FWIW2: according to this comment, it sounds unlikely that Cloudflare will change their implementation

@martindurant
Copy link
Member

it sounds unlikely that Cloudflare will change their implementation

It would be nice, but the comment doesn't promise anything in the near term.

@chongzhang
Copy link

I had the same issue, and when I added a debug line after https:/fsspec/s3fs/blob/main/s3fs/core.py#L2269 to print out the data1_size, it prints out the value which is smaller than the self.blocksize Also the values of each chuck are different besides the last final one, e.g, 1486 and 1371, even the blocksize is default 5242880. So the following if logic changes the data1, which generates different sizes of parts.

Any idea why the read to data1 doesn't return the blocksize if it's not the last part?

@martindurant
Copy link
Member

Any idea why the read to data1 doesn't return the blocksize

In general, read() is not required to return all the bytes you request, but I don't see why an io.Bytes would ever return less.

@chongzhang
Copy link

Could it be related to the cache? I tried different cache options but still got the same error.

@martindurant
Copy link
Member

Could it be related to the cache?

Which cache, what do you mean?

@chongzhang
Copy link

https:/fsspec/s3fs/blob/main/s3fs/core.py#L2020
https:/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L1568
TBH I am new to s3fs and not sure it's related to this different chunk size.

@martindurant
Copy link
Member

You have linked to two class definitions? Both have default block size of 5 * 2**20, but _upload_chunk refers consistently to self.blocksize (i.e., only one singular value)

@matthiaskern
Copy link

matthiaskern commented Dec 14, 2023

I'd be interested in helping to get this fixed as we're running into this in production with llama_index.

@martindurant
Copy link
Member

@matthiaskern , all help is welcome

@hantusk
Copy link

hantusk commented Jul 5, 2024

Ran into this today as well fwiw.

@martindurant
Copy link
Member

OK, so to summarise:

Currently, s3fs's file will flush its whole buffer, whatever the size, whenever a write() puts it over the block size. This is fine with AWS S3 (and minio and others), but R2 requires each part to be the same size.

The solution, is to allow/require S3File to always push exactly one block size at a time, potentially needing multiple remote writes at flush time, and leaving some buffer data over.

Since the remote call happens only in one place, this shouldn't be hard to code up. Does someone want to take it on? It should not split writes by default, where variable part sizes are allowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants