Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set new Google Cloud buckets as requester-pays #2083

Merged
merged 3 commits into from
Sep 14, 2023
Merged

Conversation

bemoody
Copy link
Collaborator

@bemoody bemoody commented Sep 14, 2023

GCS buckets may be either "requester-pays" (the client accessing the data must specify a "project" that will be billed for egress costs), or "non-requester-pays" (any authorized client can access the data and the bucket's owner will be billed for egress costs.)

This option can be switched on or off at any time, and it would be nice to have a way to do so in the PhysioNet console. For the time being, we want to set all newly created buckets as "requester-pays" by default.

This pull also cleans up the logic to create the bucket in a single API request rather than two.

I don't anticipate any problems with this, but it hasn't been tested. I'd suggest we push this to the live server and test with one or two small projects - check that it works and the resulting bucket settings are correct. Please don't publish any big projects to GCP until we've tested this. We can wait to merge this if that's a problem.

Fixes #2079

Benjamin Moody added 3 commits September 14, 2023 13:45
The bucket_policy_only_enabled property (in
google.cloud.storage.bucket.IAMConfiguration) has been renamed to
uniform_bucket_level_access_enabled; the old name is deprecated.
When creating a GCS bucket, we need to set various bucket properties
in addition to the bucket name.  The
google.cloud.storage.Client.create_bucket method allows setting some
properties via keyword arguments.  However, the underlying HTTP API
allows setting all of the properties at once - which means we can
create the bucket and set its properties with a single request, rather
than making two requests and risking that the second one fails.
Note that the same pattern is used by physionet.gcs.create_bucket.

(It might be better if we also could set the IAM policy at the same
time - I'm not sure if this is possible or not - but on the other hand
it might be better *not* to make the bucket accessible until after the
project files have been uploaded.)
GCS buckets may be either "requester-pays" (the client accessing the
data must specify a "project" that will be billed for egress costs),
or "non-requester-pays" (any authorized client can access the data and
the bucket's owner will be billed for egress costs.)

This option can be switched on or off at any time, and it would be
nice to have a way to do so in the PhysioNet console.  For the time
being, we want to set all buckets as "requester-pays" by default.
@tompollard
Copy link
Member

Not tested, but looks good to me. Let's merge and see what happens!

@tompollard tompollard merged commit bb42f4e into dev Sep 14, 2023
8 checks passed
@tompollard tompollard deleted the gcp-requester-pays branch September 14, 2023 20:52
@tompollard
Copy link
Member

@bemoody Now live, would you like to test on a couple of projects?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting requester-pays for google cloud buckets
2 participants