-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a performant, cloud-agnostic way to download & upload files to cloud buckets. #256
Comments
Here is the location for using gcloud for gcs files: Fall back is shutil for downloading and fall back for remote is using apache beam file systems. I also have a different shutil optimized for gcs: |
Here are some advantages of just using
|
A partial implementation of #256. My intention here is to see if this can speed up weather-dl requests.
Thanks @mahrsee1997 for pointing this out! https://cloud.google.com/blog/products/storage-data-transfer/new-gcloud-storage-enables-super-fast-data-transfers/
|
A partial implementation of #256. Here, we copy data using `gsutil cp` instead of a python routine. This speeds things up, since `gsutil` will parallelize uploads of large files. * weather-dl now uses `gsutil cp` for file upload. A partial implementation of #256. My intention here is to see if this can speed up weather-dl requests. * Temporary: no gsutil version. * Bump weather-dl version. * pinning gsutil version. * Use gcloud alpha storage cp, which is even faster :) * Set up gcloud sdk, accounting for runtime auth issue. * Added error handling to the subprocess call for copying. Co-authored-by: Rahul Mahrsee <[email protected]> * fix: added import. * Changing subprocess invocation to be more secure. Thanks @shoyer. Co-authored-by: Stephan Hoyer <[email protected]> * nit: dst, not dest. * nit: remove gcloud pip dependency. * Using gsutil for now until we upgrade project deps. Co-authored-by: Rahul Mahrsee <[email protected]> Co-authored-by: Stephan Hoyer <[email protected]>
@mahrsee1997 did some benchmarking of different cloud utilities to see what would be the fastest. Our results show that
|
A partial implementation of #256. Here, we copy data using `gsutil cp` instead of a python routine. This speeds things up, since `gsutil` will parallelize uploads of large files. * weather-dl now uses `gsutil cp` for file upload. A partial implementation of #256. My intention here is to see if this can speed up weather-dl requests. * Temporary: no gsutil version. * Bump weather-dl version. * pinning gsutil version. * Use gcloud alpha storage cp, which is even faster :) * Set up gcloud sdk, accounting for runtime auth issue. * Added error handling to the subprocess call for copying. Co-authored-by: Rahul Mahrsee <[email protected]> * fix: added import. * Changing subprocess invocation to be more secure. Thanks @shoyer. Co-authored-by: Stephan Hoyer <[email protected]> * nit: dst, not dest. * nit: remove gcloud pip dependency. * Using gsutil for now until we upgrade project deps. Co-authored-by: Rahul Mahrsee <[email protected]> Co-authored-by: Stephan Hoyer <[email protected]>
See discussion here: #254 (comment)
To investigate:
shutil.copyfileobj
?One idea that @bahmandar has explored is calling gsutils in a subprocess (the CLI is really efficient at file transfer).
The text was updated successfully, but these errors were encountered: