Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk delete support for object-store #2615

Closed
wjones127 opened this issue Aug 30, 2022 · 3 comments · Fixed by #4060
Closed

Bulk delete support for object-store #2615

wjones127 opened this issue Aug 30, 2022 · 3 comments · Fixed by #4060
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@wjones127
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

This is useful for deleting a whole partitioned table of files without having to make a request for each file. Though unfortunately it seems only S3 implements this (even GCS doesn't have it in their XML API).

S3: https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html

Describe the solution you'd like

It seems like it might be worth having this with the optimized version for S3, and otherwise have the generic version make individual requests concurrently.

Describe alternatives you've considered

Additional context

Our use case is the VACUUM operation in delta-rs, which has to delete many files in bulk.

@wjones127 wjones127 added the enhancement Any new improvement worthy of a entry in the changelog label Aug 30, 2022
@roeap
Copy link
Contributor

roeap commented Aug 30, 2022

I think azure can support this as well via blob batch.

@roeap
Copy link
Contributor

roeap commented Sep 1, 2022

also just stumbled across this for gcp :)

@tustvold
Copy link
Contributor

tustvold commented Sep 2, 2022

This makes sense to me 👍

@wjones127 wjones127 self-assigned this Apr 8, 2023
wjones127 pushed a commit to delta-io/delta-rs that referenced this issue Jul 27, 2023
# Description
Bulk delete was added to the object store
apache/arrow-rs#2615 which deletes multiple
files within a single API call if the underlying store supports it. If
it is not supported then concurrent requests are performed underneath.

This PR updates vacuum with the object store changes. Currently on S3
will see any benefits since the default bulk delete is not overridden
for other backends.

# Related Issue(s)
- progresses #393
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants