Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Applied State (part 2) #9425

Closed
4 of 6 tasks
graciegoheen opened this issue Jan 23, 2024 · 9 comments
Closed
4 of 6 tasks

[Epic] Applied State (part 2) #9425

graciegoheen opened this issue Jan 23, 2024 · 9 comments
Labels
Milestone

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Jan 23, 2024

This epic comprises the remaining work on the Applied State initiative to support better visibility into the current state of the database in a more performant way.

dbt-core 1.8

  1. High Severity backport 1.7.latest bug
    QMalcolm
  2. Impact: CA enhancement user docs
    MichelleArk
  3. enhancement
    MichelleArk QMalcolm
  4. enhancement help_wanted
    emmyoop

next

  1. enhancement
  2. enhancement freshness user docs
@adamcunnington-mlg
Copy link

@graciegoheen / @MichelleArk sorry for the direct Q on this but struggling to understand from docs / issue / PRs. Has the support for batch metadata collection been added for BQ specifically? I saw the general work in dbt-adapters and snowflake and redshift specific but I don't think BQ? Thank you

@jtcohen6
Copy link
Contributor

@adamcunnington-mlg
Copy link

@jtcohen6 I think that is just the initial support and not the batch-route - which is the critical bit. Please advise - many thanks

@MichelleArk
Copy link
Contributor

Hey @adamcunnington-mlg -- I've summarized some spiking done to evaluate the cost/benefit of implementing a batch-route for metadata freshness in BigQuery here: dbt-labs/dbt-bigquery#938. There are more details in the spike report, but my overall conclusion is that there isn't currently a way to implement a batch-strategy that achieves performance improvements for metadata-based source freshness given limitations of BigQuery's Python SDK.

@adamcunnington-mlg
Copy link

@MichelleArk I believe this is a missed conclusion here. I've left some details against your more comprehensive response; dbt-labs/dbt-bigquery#938 (comment)

These details are also in the original FR; #7012 (comment)

@graciegoheen
Copy link
Contributor Author

I'm going to close this issue out since it was for our 1.8 release

@adamcunnington-mlg
Copy link

@graciegoheen I just wanted to confirm understanding - that the batch route for BQ metadata is still not implemented. Michelle completed a spike and there's a branch with changes on but it wasn't finished/merged. Is there an ETA on when that will be done? It feels like 90% of the work was done

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Sep 30, 2024

Hi @adamcunnington-mlg - that is correct, it is still not implemented.

@jtcohen6 left a response in the issue in dbt-bigquery back in May (see here):

If you could provide us with concrete numbers for both approaches, that would help me a lot in deciding on the appropriate next step here — to switch this behavior for everyone (strictly better), to consider implementing it as another configurable option (with both pros & cons relative to the v1.8 behavior), or to put it down for good ("do nothing").

This is not a top priority for us atm, but would be helpful to understand the difference in performance between these approaches to decide on the appropriate next step.

@graciegoheen
Copy link
Contributor Author

Ah! @adamcunnington-mlg I see you follow up in this issue.

The implementation has been de-risked, so there's nothing blocking this work. This is not a focus for our team right now, but we would definitely review a PR for it if someone from the community wanted to open one up :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants