[BEAM-12913] Enable configuration of query priority in ReadFromBigQuery #15536

chunyang · 2021-09-18T04:51:22Z

Changes:

Make ReadFromBigQuery submit queries with BATCH priority by default. This mirrors the default behavior in the Java BigQueryIO and the legacy native Dataflow IO.
Add a query_priority parameter to ReadFromBigQuery to allow toggling between INTERACTIVE and BATCH priority. Add a corresponding BigQueryQueryPriority object to hold the priority constants.

Submitting BigQuery queries with BATCH priority allows queries to be started when idle resources are available and allows queries submitted from Beam to not count toward a project's concurrent rate limit.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

`ValidatesRunner` compliance status (on master branch)

Lang	ULR	Twister2
Go	---	---
Java
Python	---	---
XLang		---

Examples testing status on various runners

Lang	ULR	Dataflow	Flink	Samza	Spark	Twister2
Go	---	---	---	---	---	---	---
Java	---		---	---	---	---	---
Python	---	---	---	---	---	---	---
XLang	---	---	---	---	---	---	---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go	Java	Python

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website	Whitespace	Typescript
Non-portable
Portable	---			---	---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

chunyang · 2021-09-18T04:57:10Z

R: @pabloem

codecov · 2021-09-21T17:02:24Z

Codecov Report

Merging #15536 (59b0fb9) into master (afccd52) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #15536      +/-   ##
==========================================
- Coverage   83.78%   83.77%   -0.01%     
==========================================
  Files         444      444              
  Lines       60189    60273      +84     
==========================================
+ Hits        50427    50493      +66     
- Misses       9762     9780      +18

Impacted Files	Coverage Δ
sdks/python/apache_beam/io/gcp/bigquery.py	`75.68% <100.00%> (+0.14%)`	⬆️
sdks/python/apache_beam/io/gcp/bigquery_tools.py	`86.71% <100.00%> (+0.03%)`	⬆️
...ks/python/apache_beam/runners/interactive/utils.py	`92.30% <0.00%> (-3.30%)`	⬇️
sdks/python/apache_beam/io/source_test_utils.py	`88.47% <0.00%> (-1.39%)`	⬇️
sdks/python/apache_beam/internal/metrics/metric.py	`90.42% <0.00%> (-1.07%)`	⬇️
sdks/python/apache_beam/testing/util.py	`96.71% <0.00%> (-0.44%)`	⬇️
...hon/apache_beam/runners/worker/bundle_processor.py	`93.26% <0.00%> (-0.25%)`	⬇️
sdks/python/apache_beam/portability/common_urns.py	`100.00% <0.00%> (ø)`
sdks/python/apache_beam/runners/common.py	`88.87% <0.00%> (+0.14%)`	⬆️
sdks/python/apache_beam/transforms/external.py	`77.70% <0.00%> (+3.30%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afccd52...59b0fb9. Read the comment docs.

satybald · 2021-09-21T17:51:26Z

sdks/python/apache_beam/io/gcp/bigquery.py

@@ -1915,6 +1927,11 @@ class ReadFromBigQuery(PTransform):
 that dataset, and will remove it once it is not needed. Job needs access
 to create and delete tables within the given dataset. Dataset name
 should *not* start with the reserved prefix `beam_temp_dataset_`.
+ query_priority (BigQueryQueryPriority): By default, this transform runs


I am wondering can we make a batch priority by default in ReadFromBigQuery? What would be the use case for the user to make it interactive? I just want to make sure that the parameter that got exposed are really necessary for the end user and it confirms with style guide

https://beam.apache.org/contribute/ptransform-style-guide/

This PR does make the priority BATCH by default.

If you're asking about whether or not we should expose a query_priority parameter, I don't have strong opinions either way. My original commit (b66e4b1) makes it non-configurable, but after seeing that it's configurable in the Java BigQueryIO, I decided to make the Python side consistent.

never mind, my comment. I was thinking initially that the user doesn't need to be aware of this parameter. There's no point executing the query if there's not enough resources. Thus, batch priority for dataflow job makes better sense.

thanks for checking all of these - I was not sure about adding the extra parameter, bu since Java exposes it, I think it makes sense for it to be exposed by Python as well.

pabloem

LGTM!

pabloem · 2021-09-23T21:43:19Z

sdks/python/apache_beam/io/gcp/bigquery.py

@@ -1915,6 +1927,11 @@ class ReadFromBigQuery(PTransform):
 that dataset, and will remove it once it is not needed. Job needs access
 to create and delete tables within the given dataset. Dataset name
 should *not* start with the reserved prefix `beam_temp_dataset_`.
+ query_priority (BigQueryQueryPriority): By default, this transform runs


thanks for checking all of these - I was not sure about adding the extra parameter, bu since Java exposes it, I think it makes sense for it to be exposed by Python as well.

y1chi · 2021-09-24T17:39:19Z

It looks like this PR is causing python postcommit to fail: https://ci-beam.apache.org/job/beam_PostCommit_Python37/4309/

@chunyang Could you PTAL.

chunyang · 2021-09-24T17:55:12Z

@y1chi yes I just saw that, will put together a fix right now.

Run BigQuery queries with batch priority

b66e4b1

chunyang force-pushed the cyang/rfbq-batch branch from 118767c to 07ce10f Compare September 21, 2021 16:45

chunyang added 2 commits September 21, 2021 17:21

Allow changing query priority in ReadFromBigQuery

436fa06

Update CHANGES

59b0fb9

chunyang force-pushed the cyang/rfbq-batch branch from 07ce10f to 59b0fb9 Compare September 21, 2021 17:21

aaltay requested a review from pabloem September 21, 2021 17:48

satybald reviewed Sep 21, 2021

View reviewed changes

pabloem reviewed Sep 23, 2021

View reviewed changes

pabloem merged commit 84c082d into apache:master Sep 23, 2021

chunyang deleted the cyang/rfbq-batch branch September 24, 2021 17:55

chunyang mentioned this pull request Sep 24, 2021

[BEAM-12913] Pass query priority from ReadAllFromBigQuery #15584

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-12913] Enable configuration of query priority in ReadFromBigQuery #15536

[BEAM-12913] Enable configuration of query priority in ReadFromBigQuery #15536

chunyang commented Sep 18, 2021 •

edited

Loading

chunyang commented Sep 18, 2021

codecov bot commented Sep 21, 2021 •

edited

Loading

satybald Sep 21, 2021

chunyang Sep 21, 2021

satybald Sep 21, 2021

pabloem Sep 23, 2021

pabloem left a comment

pabloem Sep 23, 2021

y1chi commented Sep 24, 2021

chunyang commented Sep 24, 2021

[BEAM-12913] Enable configuration of query priority in ReadFromBigQuery #15536

[BEAM-12913] Enable configuration of query priority in ReadFromBigQuery #15536

Conversation

chunyang commented Sep 18, 2021 • edited Loading

ValidatesRunner compliance status (on master branch)

Examples testing status on various runners

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

GitHub Actions Tests Status (on master branch)

chunyang commented Sep 18, 2021

codecov bot commented Sep 21, 2021 • edited Loading

Codecov Report

satybald Sep 21, 2021

Choose a reason for hiding this comment

chunyang Sep 21, 2021

Choose a reason for hiding this comment

satybald Sep 21, 2021

Choose a reason for hiding this comment

pabloem Sep 23, 2021

Choose a reason for hiding this comment

pabloem left a comment

Choose a reason for hiding this comment

pabloem Sep 23, 2021

Choose a reason for hiding this comment

y1chi commented Sep 24, 2021

chunyang commented Sep 24, 2021

chunyang commented Sep 18, 2021 •

edited

Loading

`ValidatesRunner` compliance status (on master branch)

codecov bot commented Sep 21, 2021 •

edited

Loading