Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snowflake create or replace #1409
Snowflake create or replace #1409
Changes from 26 commits
9772c1c
56801f9
38254a8
9222c79
a35ad18
2d5525e
6a104c1
d168bdd
91d869e
fb26ce5
dacce7c
54c02ef
2830b6a
95c9f76
0433369
e83edd3
9591b86
f99efbf
5c1c588
43a9db5
3ab8238
4f62978
08820a2
0432c1d
90f8e0b
afe236d
8af7984
85eac05
3ef519d
7a2279e
3a7dcd9
1f97fe4
8d74550
90abc2d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a really good fix for the
on false
issue with Snowflake's merge statements. Do you think it makes sense to put this logic here? Or should we move it into the Snowflake implementation ofget_merge_sql
?I like the idea of making materializations represent business logic instead of database logic, as they become a lot more generalizable. Curious what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes total sense! I actually was feeling a bit "awkward" about having this logic sit there but didn't think too much about where else it could live and this is very good, so I'm going to go ahead and change this as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I think this would be the place to implement it. If
unique_key
is provided, then we can proceed withcommon_get_merge_sql
, otherwise we should return theinsert
statement you've built hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, its exactly what I just started doing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing, I realised there is no incremental deletes anymore, and the merge statement doesn't call a delete. Would you think we need it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation of
incremental
models on Snowflake useddelete
statements to approximate an upsert. Before we did:So, records were only deleted if they were going to be immediately re-inserted. We'd actually prefer not to call a
delete
, and instead use themerge
to update these rows in-place. This should be handled by thewhen matched
clause in themerge
statement.I do think there's a conversation to be had about performance. I wonder if there's any difference between:
An example
Destination table
Temp table (generated from model
select
)Desired destination table state
So, there are two ways to accomplish this desired end-state. We can either (pseudocode):
1. delete + insert
2. update + insert (via
merge
)This does raise an interesting question about edge-case behavior with
merge
. What happens if there are duplicateunique_id
s in either 1) the destination table or 2) the staging table?Previously, it was straightforward to understand how the
delete
+insert
pattern behaved. While having a duplicatedunique_key
would probably lead to undesirable results, theinsert
anddelete
queries would execute successfully.With the
merge
implementation, I think users will see an error about non-deterministic results if theirunique_key
is not unique! All told, I think this will actually be a good thing, as it should help alert users to bugs in their model code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. From what you say here's what I think. Merge is definitely the preferable option and I think unless there's really a good reason for it, you should be getting an error if you're trying to insert dupes. There is probably something fucked up with the source.
Alternatively we could add support for the
ERROR_ON_NONDETERMINISTIC_MERGE
session parameter (whenFALSE
it would pick one of the duplicated rows and insert it) but there doesn't seem to be a clear way on how to select the row and I think this is just bad anyway. I don't really see the point of inserting a dupe row. So I agree with your last point in that comment. So I think the current implementation is cool.