-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use show xxx queries instead of information schema on Snowflake #1999
Conversation
@drewbanin I ran a quick test against a project that takes a long time to start. Running a specific model selector that takes about 80 sec with dbt 0.15, this new version took 40 sec. This project uses 9 custom schemas. Since the introspection queries are run sequentially, the time-to-first-model is proportional to the number of schemas in the project. What I noticed is that the Can you use |
Awesome, thanks for giving it a spin @pedromachados!
I noticed this too! This happens because Snowflake does not allow us to submit multiple statements in the same query. Accordingly, there's some code in dbt that splits the So, one other option here is to just run I'm happy to sub out Couple other things here:
|
fyi, i just learned that roles needs monitor privileges to run desc on a warehouse (usage + operation are not sufficient) whereas the actual error message is "insufficient permissions to operate on X". desc could be have quirky privileges requirements |
@drewbanin I did another run. From start to finish, it took about 60 sec:
10 schemas are inspected (8 custom + seed + default). I agree that dbt could be smarter about analyzing only schemas involved in a given run. The queries that use |
@jtalmi you said
I did notice some weird quirks around @nehiljain were you able to run dbt with this branch? The information schema queries on the Snowflake cluster that I have access to are relatively fast, so I'm not able to tell if |
These queries don't appear to be meaningfully faster than hitting the information schema in practice :/ Closing this out, but happy to reopen if anyone finds that they're getting better performance characteristics with this approach |
Work in progress. This branch uses queries like
show xxx in yyy
instead of hitting the information schema. This hope is that this approach is more performant than the existing approach.TODO: