You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
When dbt generates your snapshot, one of the meta-fields it creates is dbt_scd_id - a unique key generated for each snapshotted record, used internally by dbt.
Currently, dbt_scd_id is a string because of the hashing function used. dbt_scd_id is a combo of unique_key + updated_at for timestamp strategy (essentially creates a surrogate key).
Some folks want dbt_scd_id to instead be an integer for better performance.
If we swapped to an integer, we’d have to use a hashing function that outputted an integer instead of a string.
integers collide a lot more easily than a string
if we get a collision, we get unintended behavior (will fail silently!)
Because of the risk of collision, we wouldn't want to make this the default for all users.
Instead, what if we had a config that allowed you to control the hashing function used when generating dbt_scd_id?
Describe alternatives you've considered
Creating a custom materialization to override the outputs from generate_surrogate_key
The text was updated successfully, but these errors were encountered:
graciegoheen
changed the title
[Feature] generate dbt_scd_id as an integer column instead of a string for performance improvements
[Feature] option to generate dbt_scd_id as an integer column instead of a string for performance improvements
Jun 12, 2024
Is this your first time submitting a feature request?
Describe the feature
When dbt generates your snapshot, one of the meta-fields it creates is
dbt_scd_id
- a unique key generated for each snapshotted record, used internally by dbt.Currently,
dbt_scd_id
is a string because of the hashing function used.dbt_scd_id
is a combo ofunique_key
+updated_at
fortimestamp
strategy (essentially creates a surrogate key).Some folks want
dbt_scd_id
to instead be an integer for better performance.If we swapped to an integer, we’d have to use a hashing function that outputted an integer instead of a string.
Because of the risk of collision, we wouldn't want to make this the default for all users.
Instead, what if we had a config that allowed you to control the hashing function used when generating
dbt_scd_id
?Describe alternatives you've considered
Creating a custom materialization to override the outputs from
generate_surrogate_key
The text was updated successfully, but these errors were encountered: