-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.18.1 reintroduces S3 multipart upload bug #2605
Comments
Can you give a reproducible example / any details about the size of the table / amount of writes? A log report? I wrote the modified code for 0.18.1 and it seems to have fixed other people's problems — I'll take a look at it :) |
Hi, Thank you for the prompt response. Unfortunately, that happened in our enterprise dev environment so I can't provide the proprietary data. I can provide two more observations though:
|
Understandable if you can't share enterprise data, but even a censored error code/log would be great! But yeah, the behavior in 0.18.1 changed so that any writes buffer into an in-memory buffer, which flushes when it exceeds the threshold set. The threshold per the config is
|
That is the root cause. I did a test write with 5*2**20 and it worked this time. Can you push another release with this fix? Btw, the error message from before:
|
I don't control the release cycles, but you can compile from source with that fix! The current version will also work if you set that option. |
# Description Object stores expected fixed lengths for all multipart upload parts right up until the last part. The original logic just flushed when it exceeded the threshold. Now, it flushes when the threshold is met exclusively with the same fixed buffer, unless we're completing the transaction, in which case the last piece is allowed to be smaller. Bumps the constant to reflect that the minimum expected size by most object stores is 5MiB. Also adds a UserWarning if a constant is specified to be less. Also releases the GIL in more places by moving the flushing logic to a free function. # Related Issue(s) <!--- For example: - closes #106 ---> Closes #2605 # Documentation <!--- Share links to useful documentation ---> See: [MultipartUpload](https://docs.rs/object_store/latest/object_store/trait.MultipartUpload.html) docs
Environment
Delta-rs version: 0.18.1
Binding: Python
Environment:
Bug
What happened: Same error as #890 but on S3 directly instead of non-S3
What you expected to happen:
How to reproduce it: Write regular size data to S3 with write_deltalake()
More details: 0.18.0 works fine
The text was updated successfully, but these errors were encountered: