Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-427] When cloning an entire database, I'd like to recreate all views to redirect references from self to the newly cloned self #118

Closed
ybressler opened this issue Mar 25, 2022 · 2 comments
Labels
enhancement New feature or request Stale

Comments

@ybressler
Copy link

ybressler commented Mar 25, 2022

Describe the feature

When cloning an entire database, any views being cloned will retain their original SQL statements. In the case where these views contain references to the current DB (parent), I want these references to be redirected to the cloned DB (child). Additionally, if I am cloning multiple DB's, it would be great to be able to point any references via a mapping – ex. provide a python dictionary (or yaml eq) and have any references overwritten or redirected.

Describe alternatives you've considered

  • DBT solution: I can recreate all models which are views after cloning using a specific target. But this method is error prone and too complex a solution for my taste.
  • Snowflake solution: Use materialized views instead of regular views (which are indeed cloned in contrast to regular views). Note, this functionality was released pretty recently - Feb 2022.

Additional context

Currently, cloning occurs as a SQL command to Snowflake – DBT doesn't have a built in way of being aware of these changes. The only way to transmit these changes is to switch targets and assume naively that everything magically appeared (which is fine for the most part, but can lead to issues in cases like these).

Related issue: dbt-labs/dbt-core#4959

Who will this benefit?

Anyone using DBT with Snowflake who has considered or is already using cloning. (As an aside, Snowflake's zero copy cloning is one of its features which sets it apart from its competitors – it would be incredibly valuable to have a more DBT-native solution available).

Are you interested in contributing this feature?

Certainly I am, but would need to have someone else own the project management of this task. (I can do this "on the side" so to say.)

@ybressler ybressler added enhancement New feature or request triage labels Mar 25, 2022
@github-actions github-actions bot changed the title When cloning an entire database, I'd like to recreate all views to redirect references from self to the newly cloned self [CT-427] When cloning an entire database, I'd like to recreate all views to redirect references from self to the newly cloned self Mar 25, 2022
@jtcohen6
Copy link
Contributor

@ybressler Thanks for opening, and sorry for the delay getting back to you!

Bad solution

As you mention, this would do the trick:

  1. Clone the entire database
  2. dbt run --select config.materialized:view --threads 100

It's just not the most savory of solutions :)

Better solution

This is exactly the question you raised in dbt-labs/dbt-core#4959. It's possible (and pretty easy) to override ref to exclude the database identifier.

Docs: https://docs.getdbt.com/reference/dbt-jinja-functions/builtins
Discourse: https://discourse.getdbt.com/t/performing-a-blue-green-deploy-of-your-dbt-project-on-snowflake/1349

The trade-off is, if you do this all the time, you can't create your objects across multiple databases. I remember working on a project that wanted it both ways, and ended up setting a rule like:

  • Views can only select from tables in the same database
  • Tables can select from objects in other databases
-- Render identifiers without a database if the current model is materialized as a view.
-- Otherwise, include the database.

{% macro ref(model_name) %}

  {% set rel = builtins.ref(model_name) %}
  {% set not_a_view = (config.get('materialized') != 'view') %}
  {% do return(rel.include(database=not_a_view)) %}

{% endmacro %}

Best solution???

I completely buy the thing you're saying here:

As an aside, Snowflake's zero copy cloning is one of its features which sets it apart from its competitors – it would be incredibly valuable to have a more DBT-native solution available

I think we'll see cloning supported by more databases, and it will be all-the-more important for dbt to support a built-in clone method/macro that can abstract across their differences.

I've also long wanted to support a "clone" mode of --defer, rather than just rewriting upstream references: dbt-labs/dbt-core#5095. (This is important for feeding "Slim CI" schemas into BI tools. With traditional --defer, unbuilt upstream models are totally missing.)

Following that approach, dbt could clone upstream objects "as needed." Cloning a table would look like cloning a table. Cloning a view may look like cloning + rewriting refs, or (much simpler) just running that view.

If you're running hundreds of models, the accumulated time of doing this is probably more than a one-shot create database clone .... If you're just running a handful of models, though, you can skip a lot of unneeded cloning. If you've changed a number of models, and you want to inter-weave the running of changed models and the cloning of unchanged models, all in DAG order—such a construct would give dbt the ability to do it.

@jtcohen6 jtcohen6 removed the triage label Apr 27, 2022
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

2 participants