Support temp tables in Snowflake "table" materializations #2725

gil-walzer-zocdoc · 2020-08-24T21:17:54Z

Describe the feature

Please allow us to specify a "temporary" option in the config that will create a temp table materialization.

https:/fishtown-analytics/dbt/blob/dev/marian-anderson/plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql#L26

Describe alternatives you've considered

I could create a custom materialization that does what is described, but the change seems sufficiently small and useful to go in DBT's materialization.

Additional context

Snowflake offers two alternatives to permanent tables- temporary tables and transient tables. Temp tables are dropped at the end of the session while transient tables must be explicitly dropped, otherwise they incur charges. DBT's table materialization currently forces users to use transient tables, and this should be opt-out.

Who will this benefit?

Users that need temp tables for their projects, but don't want those tables to persist between runs, would benefit from having those tables be temporary. Projects that have frequent logic changes to their intermediate transforms as well as large datasets fit this description.

Are you interested in contributing this feature?

I would be willing to make this change

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2020-08-26T14:16:17Z

Hey @gil-walzer-zocdoc, what's your use case for temp tables as a materialization?

dbt opens several different Snowflake connections (sessions) within a given run, even more so when running with multiple threads. Since temp tables are dropped at the end of each session, they don't stick around for downstream models to select from them.

We do support a temporary argument to the create_table_as macro, and we use temp tables as a tool within more complex materializations. E.g. the Snowflake incremental materialization creates a temp table as the first step in an incremental run.

If you don't want those models sticking around as tables between runs—to avoid storage costs, I'm guessing?—I'd advise:

Materializing them as ephemeral models or views instead. (This wouldn't work as well for large datasets.)
Adding an on-run-end hook that executes a series of drop table if exists statements.

gil-walzer-zocdoc · 2020-08-26T14:52:22Z

Ah ok, I didn't realize dbt actually opened up several Snowflake connections, that certainly dashes my hopes.

I described our use case in the description; we're operating on a large dataset, but the domain is still pretty new to our business so our transforms are still going through a lot of logic changes. So we wanted to store intermediate transforms as temp tables, not CTEs, but we also were running into issues where the transient table would be created with a set of columns that wouldn't match the logic in a subsequent run.

I should mention that we're leveraging a Snowflake database clone to run these transforms on while we develop. I came up with a custom method to set up clones and point Snowflake towards them via the profiles .yml file, but I'd be interested if you or your team had anything in mind to natively support using Snowflake clones in DBT.

jtcohen6 · 2020-08-26T15:10:38Z

Right on, that makes sense.

we also were running into issues where the transient table would be created with a set of columns that wouldn't match the logic in a subsequent run.

I don't think I understand this piece. In subsequent runs, does the transient intermediate table encounter a mismatch with data changes in upstream sources, or with logic changes in downstream models? Wouldn't you be rerunning the intermediate model (and recreating the transient table) if it's within the modified section of the DAG?

Zero-copy cloning is a powerful Snowflake feature. Since it's available as DDL statements, we've found it to play nicely with existing dbt constructs (hooks and operations), and we haven't felt the need to wrap it more natively in an adapter method or materialization. Claire has a great discourse post about quickly spinning up a dev environment via ZCCs.

gil-walzer-zocdoc · 2020-08-26T15:33:00Z

The issue was actually just for a single model which had a data change between two runs, not for data changes between models. We were using an incremental materialization, which was possibly incorrect.

What I was seeing was this, and remember this is only about a single model: On the first run of our DAG (on a fresh database), DBT created transient tables from the columns specified in the model. On a subsequent run, the output columns in the model changed, and the materialization prepared a new temp table (suffixed with _dbt_tmp) with the correct columns, but did not replace the transient table before attempting to insert.

If you believe we're misusing the incremental materialization and would be better served with an alternative, please let us know.

jtcohen6 · 2020-08-26T15:51:34Z

Got it! I don't think you're misusing the incremental materialization, though you should be aware that the inability to capture column additions/deletions/changes is a significant limitation. There's a long-lived issue with proposed remedies: #1132. Generally, models that frequently update their columns are poor candidates for incremental builds.

I'm going to close this issue, in favor of the bigger-picture discussion, since there isn't a specific code change we'll be making to support temp tables as a materialization.

gil-walzer-zocdoc · 2020-08-26T15:54:32Z

Sounds good. Thanks for the discussion.

gil-walzer-zocdoc · 2022-03-28T21:21:03Z

Hi @jtcohen6, we at Zocdoc have started looking into supporting concurrent DBT runs without threading (as separate processes). I wanted to double-check that your point above is still accurate- that Snowflake DBT opens and closes sessions between models. Is that still true?

jtcohen6 · 2022-03-29T09:39:22Z

Yes, that's still true. This rough mapping holds: one model = one materialization = one Snowflake session/connection (possibly more if additional queries are run before/after)

jim256 · 2022-06-21T00:49:47Z

Hey, @jtcohen6.

We're migrating from traditional SQL transformations. We used temp tables in those because the datasets were large. I'd love to stick with this instead of using CTEs, so I'm thinking transient tables. I also find temp tables more readable, easier to debug, more modular, etc.

The downside with anything but CTEs in dbt is naming issues. We have 100+ transformations that run every hour. It's likely that two separate transformations would create "sales" temp tables that wouldn't use the same logic. Since temp tables are scoped to their session, we didn't have to worry about the temp table names of one transformation conflicting with those of another. I was hoping temp tables in dbt would do the trick, but sounds like that's out.

Any recommendations on how to avoid naming conflict issues without an unreasonable name for every temp table (like [this_transformation_name]_sales) ?

KeeonTabrizi · 2022-11-09T00:22:46Z

I personally, would love to see support for temporary tables and the ability to have the model run with a single thread to avoid the issues with multiple sessions DBT creates.

gil-walzer-zocdoc added enhancement New feature or request triage labels Aug 24, 2020

jtcohen6 removed the triage label Aug 26, 2020

jtcohen6 closed this as completed Aug 26, 2020

KeeonTabrizi mentioned this issue Nov 10, 2022

[CT-1487] [Feature] Allow DBT Models to Reference Created Temporary Tables prior to final result set & materialization. #6234

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support temp tables in Snowflake "table" materializations #2725

Support temp tables in Snowflake "table" materializations #2725

gil-walzer-zocdoc commented Aug 24, 2020

jtcohen6 commented Aug 26, 2020 •

edited

Loading

gil-walzer-zocdoc commented Aug 26, 2020

jtcohen6 commented Aug 26, 2020

gil-walzer-zocdoc commented Aug 26, 2020

jtcohen6 commented Aug 26, 2020

gil-walzer-zocdoc commented Aug 26, 2020

gil-walzer-zocdoc commented Mar 28, 2022

jtcohen6 commented Mar 29, 2022

jim256 commented Jun 21, 2022 •

edited

Loading

KeeonTabrizi commented Nov 9, 2022

Support temp tables in Snowflake "table" materializations #2725

Support temp tables in Snowflake "table" materializations #2725

Comments

gil-walzer-zocdoc commented Aug 24, 2020

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

jtcohen6 commented Aug 26, 2020 • edited Loading

gil-walzer-zocdoc commented Aug 26, 2020

jtcohen6 commented Aug 26, 2020

gil-walzer-zocdoc commented Aug 26, 2020

jtcohen6 commented Aug 26, 2020

gil-walzer-zocdoc commented Aug 26, 2020

gil-walzer-zocdoc commented Mar 28, 2022

jtcohen6 commented Mar 29, 2022

jim256 commented Jun 21, 2022 • edited Loading

KeeonTabrizi commented Nov 9, 2022

jtcohen6 commented Aug 26, 2020 •

edited

Loading

jim256 commented Jun 21, 2022 •

edited

Loading