Incorrect MultiPartUpload Chunksize #789

alex-kharlamov · 2023-09-18T11:15:00Z

Description:

I have encountered an issue with the MultiPartUpload functionality in the s3fs library where the chunk size used during upload appears incorrect.

Environment Information:

s3fs Version: 2023.9.1
Python Version: 3.9
Operating System: Ubuntu 22.04

Issue Details:

When using s3fs to upload large files to an S3 bucket on Cloudflare R2, I noticed that the chunk size used for MultiPartUpload needs to be consistent.

Expected Behavior:

I expect s3fs to use the same chunk size for files with the body(b"1" * 5 * 2**30) + b"kek" and b"1" * 5 * 2**30

Steps to Reproduce:

Working example:

import io

with fsspec.open('s3://path', mode='wb') as f:
    buffer = io.BytesIO(b"1" * 5 * 2**30)
    f.write(buffer.getvalue())

This code works perfectly.

But if we change the buffer size to:

with fsspec.open('s3://path', mode='wb') as f:
    buffer = io.BytesIO((b"1" * 5 * 2**30) + b"kek" )
    f.write(buffer.getvalue())

This code gives the error:
ClientError: An error occurred (InvalidPart) when calling the CompleteMultipartUpload operation: All non-trailing parts must have the same length.

Many thanks for considering my request.

The text was updated successfully, but these errors were encountered:

martindurant · 2023-09-18T13:19:17Z

That is most unfortunate - AWS S3 doesn't have this limitation so long as each chunk is big enough. An S3File could in theory be configured to always send a specific size (in _upload_chunk), and retain the remainder in the buffer, but it would be annoying to code and only get used by niche backends that need it (perhaps only R2).

plaflamme · 2023-09-26T23:50:39Z

FWIW: arrow made a fix for this apache/arrow#34363
FWIW2: according to this comment, it sounds unlikely that Cloudflare will change their implementation

martindurant · 2023-09-27T13:42:16Z

it sounds unlikely that Cloudflare will change their implementation

It would be nice, but the comment doesn't promise anything in the near term.

chongzhang · 2023-12-06T19:38:29Z

I had the same issue, and when I added a debug line after https:/fsspec/s3fs/blob/main/s3fs/core.py#L2269 to print out the data1_size, it prints out the value which is smaller than the self.blocksize Also the values of each chuck are different besides the last final one, e.g, 1486 and 1371, even the blocksize is default 5242880. So the following if logic changes the data1, which generates different sizes of parts.

Any idea why the read to data1 doesn't return the blocksize if it's not the last part?

martindurant · 2023-12-06T19:41:05Z

Any idea why the read to data1 doesn't return the blocksize

In general, read() is not required to return all the bytes you request, but I don't see why an io.Bytes would ever return less.

chongzhang · 2023-12-07T18:23:39Z

Could it be related to the cache? I tried different cache options but still got the same error.

martindurant · 2023-12-07T19:04:49Z

Could it be related to the cache?

Which cache, what do you mean?

chongzhang · 2023-12-07T20:01:42Z

https:/fsspec/s3fs/blob/main/s3fs/core.py#L2020
https:/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L1568
TBH I am new to s3fs and not sure it's related to this different chunk size.

martindurant · 2023-12-07T20:06:15Z

You have linked to two class definitions? Both have default block size of 5 * 2**20, but _upload_chunk refers consistently to self.blocksize (i.e., only one singular value)

matthiaskern · 2023-12-14T09:18:20Z

I'd be interested in helping to get this fixed as we're running into this in production with llama_index.

martindurant · 2023-12-14T14:15:14Z

@matthiaskern , all help is welcome

hantusk · 2024-07-05T08:48:27Z

Ran into this today as well fwiw.

martindurant · 2024-07-05T14:50:29Z

OK, so to summarise:

Currently, s3fs's file will flush its whole buffer, whatever the size, whenever a write() puts it over the block size. This is fine with AWS S3 (and minio and others), but R2 requires each part to be the same size.

The solution, is to allow/require S3File to always push exactly one block size at a time, potentially needing multiple remote writes at flush time, and leaving some buffer data over.

Since the remote call happens only in one place, this shouldn't be hard to code up. Does someone want to take it on? It should not split writes by default, where variable part sizes are allowed.

arogozhnikov mentioned this issue Aug 1, 2024

Add support for cloudflare's R2 storage #888

Merged

martindurant closed this as completed in #888 Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect MultiPartUpload Chunksize #789

Incorrect MultiPartUpload Chunksize #789

alex-kharlamov commented Sep 18, 2023

martindurant commented Sep 18, 2023

plaflamme commented Sep 26, 2023

martindurant commented Sep 27, 2023

chongzhang commented Dec 6, 2023

martindurant commented Dec 6, 2023

chongzhang commented Dec 7, 2023

martindurant commented Dec 7, 2023

chongzhang commented Dec 7, 2023

martindurant commented Dec 7, 2023

matthiaskern commented Dec 14, 2023 •

edited

Loading

martindurant commented Dec 14, 2023

hantusk commented Jul 5, 2024

martindurant commented Jul 5, 2024

Incorrect MultiPartUpload Chunksize #789

Incorrect MultiPartUpload Chunksize #789

Comments

alex-kharlamov commented Sep 18, 2023

martindurant commented Sep 18, 2023

plaflamme commented Sep 26, 2023

martindurant commented Sep 27, 2023

chongzhang commented Dec 6, 2023

martindurant commented Dec 6, 2023

chongzhang commented Dec 7, 2023

martindurant commented Dec 7, 2023

chongzhang commented Dec 7, 2023

martindurant commented Dec 7, 2023

matthiaskern commented Dec 14, 2023 • edited Loading

martindurant commented Dec 14, 2023

hantusk commented Jul 5, 2024

martindurant commented Jul 5, 2024

matthiaskern commented Dec 14, 2023 •

edited

Loading