Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include examples of unique_key as a list #4642

Open
1 task done
dbeatty10 opened this issue Dec 13, 2023 · 1 comment
Open
1 task done

Include examples of unique_key as a list #4642

dbeatty10 opened this issue Dec 13, 2023 · 1 comment
Labels
content Improvements or additions to content incremental models

Comments

@dbeatty10
Copy link
Contributor

dbeatty10 commented Dec 13, 2023

Contributions

  • I have read the contribution docs, and understand what's expected of me.

Link to the page on docs.getdbt.com requiring updates

https://docs.getdbt.com/reference/resource-configs/unique_key

What part(s) of the page would you like to see updated?

unique_key is only discussed in the context of snapshots, but it is also a "key" component for incremental models!

Depending on the context, it can be a column name, a SQL expression (snapshots only), or a list (incremental models only). And for at least the delete+insert incremental strategy, it doesn't have to uniquely match to only a single row in a data set.

What happens when…

  • unique_key is a column name?
    • ✅ Snapshots
    • ✅ Incremental models
  • unique_key is an expression?
    • ✅ Snapshots
    • ❌ Incremental models (error)
  • unique_key is a list of column names?
    • ❌ Snapshots (error)
    • ✅ Incremental models
  • unique_key is not actually unique?
    • ❌ Snapshots (bad data)
    • Incremental models
      • ✅ append
      • ✅ delete_insert
      • ✅ insert_overwrite
      • ❌ merge (error)

Error messages for merge when unique_key is not unique

dbt-postgres

22:24:21    Database Error in model merge (models/merge.sql)
  MERGE command cannot affect row a second time
  HINT:  Ensure that not more than one source row matches any one target row.

dbt-redshift

22:24:38    Database Error in model merge (models/merge.sql)
  Found multiple matches to update the same tuple.

dbt-snowflake

22:24:50    Database Error in model merge (models/merge.sql)
  100090 (42P18): Duplicate row detected during DML action
  Row Values: [redacted]

dbt-bigquery

00:21:20    Database Error in model merge (models/merge.sql)
  UPDATE/MERGE must match at most one source row for each target row

dbt-databricks

22:25:16    Runtime Error in model merge (models/merge.sql)
  Cannot perform Merge as multiple source rows matched and attempted to modify the same
  target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge,
  when multiple source rows match on the same target row, the result may be ambiguous
  as it is unclear which source row should be used to update or delete the matching
  target row. You can preprocess the source table to eliminate the possibility of
  multiple matches. Please refer to
  https://docs.databricks.com/delta/merge.html#merge-error

Effects on snapshots when unique_key is not unique

dbt-postgres

dbt-postgres doesn't use a merge statement for snapshots, but use deletes and inserts instead. So it will succeed, but the snapshot will contain bad data when unique_key is not unique.

dbt-redshift

dbt-redshift doesn't use a merge statement for snapshots, but use deletes and inserts instead. So it will succeed, but the snapshot will contain bad data when unique_key is not unique.

dbt-snowflake

dbt-snowflake uses a merge statement for snapshots, so it fails when unique_key is not unique:

00:31:07    Database Error in snapshot snapshot (snapshots/snapshot.sql)
  100090 (42P18): Duplicate row detected during DML action
  Row Values: [redacted]

dbt-bigquery

dbt-bigquery uses a merge statement for snapshots, so it fails when unique_key is not unique:

00:31:30    Database Error in snapshot snapshot (snapshots/snapshot.sql)
  UPDATE/MERGE must match at most one source row for each target row

dbt-databricks

dbt-databricks uses a merge statement for snapshots, so it fails when unique_key is not unique:

01:17:58    Runtime Error in snapshot snapshot (snapshots/snapshot.sql)
  Cannot perform Merge as multiple source rows matched and attempted to modify the same
  target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge,
  when multiple source rows match on the same target row, the result may be ambiguous
  as it is unclear which source row should be used to update or delete the matching
  target row. You can preprocess the source table to eliminate the possibility of
  multiple matches. Please refer to
  https://docs.databricks.com/delta/merge.html#merge-error

Additional information

Here's a sample of pages that discuss the purpose and behavior of unique_key within either snapshots or incremental models:

Instigating context

By the way, I noticed that this list option is missing from the docs here:
https://docs.getdbt.com/reference/resource-configs/unique_key#use-a-combination-of-two-columns-as-a-unique-key

Originally posted by @cdabel in dbt-labs/dbt-core#4618 (comment)

Discovered while researching dbt-labs/dbt-adapters#150.

@runleonarun
Copy link
Collaborator

but it is also a "key" component for incremental models!
😆

@runleonarun runleonarun added incremental models content Improvements or additions to content labels Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content incremental models
Projects
None yet
Development

No branches or pull requests

2 participants