Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up Snowflake column comments, while still avoiding errors #3543

Merged
merged 2 commits into from
Jul 7, 2021

Conversation

jtcohen6
Copy link
Contributor

@jtcohen6 jtcohen6 commented Jul 7, 2021

resolves #3541, slack thread
see also: #3149, #3039

Description

#3149 did a wonderful job of resolving the undesirable behavior (originally reported in #3039) whereby a column description, specified in yaml, would cause a model to fail building if the column doesn't actually exist in the model query. This "strictness" could be a feature, someday, but it's not how we think about properties (and especially descriptions) today.

Unfortunately, the approach taken in #3149 is resulting in much, much slower builds for models with many description-bearing columns—from seconds to minutes. This is because the Snowflake python connector requires running each semicolon-delineated query on its own, and so they run in sequence, hundreds of times. (Whereas dbt-postgres, which also uses per-column comment on statements, can run big batches of semicolon-delineated statements all at once.)

This PR takes an alternative approach—option 2 that I outlined in the original issue—by checking to see which columns exist in the just-created table, via adapter.get_columns_in_relation, and then comparing the result against the dictionary of columns with descriptions defined.

I'd like to sneak this into v0.20.0 final if possible :)

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR — Existing tests, including the one added by Check if a snowflake column exists before altering its comment #3149
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@jtcohen6 jtcohen6 requested a review from leahwicz July 7, 2021 20:18
@cla-bot cla-bot bot added the cla:yes label Jul 7, 2021
@jtcohen6 jtcohen6 temporarily deployed to Postgres July 7, 2021 20:24 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Redshift July 7, 2021 20:25 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Redshift July 7, 2021 20:25 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Bigquery July 7, 2021 20:25 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Bigquery July 7, 2021 20:25 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Snowflake July 7, 2021 20:25 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Snowflake July 7, 2021 20:25 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Postgres July 7, 2021 21:50 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Bigquery July 7, 2021 21:50 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Snowflake July 7, 2021 21:50 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Snowflake July 7, 2021 21:50 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Redshift July 7, 2021 21:50 Inactive
@jtcohen6 jtcohen6 temporarily deployed to Redshift July 7, 2021 21:50 Inactive
@jtcohen6 jtcohen6 merged commit 85627aa into develop Jul 7, 2021
@jtcohen6 jtcohen6 deleted the fix/snowflake-persist-docs-columns-slowdown branch July 7, 2021 22:18
jtcohen6 added a commit that referenced this pull request Jul 7, 2021
* Have our cake and eat it quickly, too

* Update changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snowflake comments with 0.20-rc2 create a query per comment
2 participants