Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add invocation_id to BigQuery jobs #2809

Merged
merged 6 commits into from
Oct 7, 2020
Merged

Add invocation_id to BigQuery jobs #2809

merged 6 commits into from
Oct 7, 2020

Conversation

mescanne
Copy link
Contributor

@mescanne mescanne commented Oct 3, 2020

This resolves #2808.

Description

It adds the invocation_id into every label for a BigQuery job. The reason this is preferrable is because the labels have a 128 byte limit, and using the query comment could easily breach this limit.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Oct 3, 2020
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mescanne Thanks for the contribution, and sorry for my initial hesitations here. The change is really straightforward and, as I think more about it, the addition is uncontroversial.

I don't think this PR fully resolves #2483, which asks for the ability to configure job tags/labels. It definitely resolves the subset functionality contained in #2808, which I've reopened. We may end up deciding that, by using the invocation_id as foreign key between dbt artifacts (run_results.json) and BigQuery query history, we have access to the metadata we need and are no longer in need of more configurable tags/labels.

So my only ask would be that you update the changelog to reflect resolution of #2808 rather than #2483.

@mescanne
Copy link
Contributor Author

mescanne commented Oct 7, 2020

Done -- reflected to be #2808.

The team I'm working with already records DBT runs using the invocation_id into an audit table in BigQuery. So linking it to the INFORMATION_SCHEMA and using this label is a very natural and easy solution.

Another approach I am thinking about is the INFORMATION_SCHEMA jobs table has the raw query as well. You could use BigQuery regex parsing and JSON parsing to extract information from the query_comment. However this is potentially costly (reading all of the queries - potentially big queries - back across all time), and also quite tricky from a SQL perspective. An audit table with your logical attributes for the run combined with joining on invocation_id is fairly straight forward.

Adding in the stage and other step-level information as labels could be good, but one step at a time..

Thanks a lot.

@jtcohen6 jtcohen6 merged commit 93168fe into dbt-labs:dev/kiyoshi-kuromiya Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery adapter should record in a label the dbt invocation_id
2 participants