Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] option to generate dbt_scd_id as an integer column instead of a string for performance improvements #10300

Open
3 tasks done
Tracked by #10151
graciegoheen opened this issue Jun 12, 2024 · 1 comment
Labels
enhancement New feature or request snapshots Issues related to dbt's snapshot functionality

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Jun 12, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

When dbt generates your snapshot, one of the meta-fields it creates is dbt_scd_id - a unique key generated for each snapshotted record, used internally by dbt.

Currently, dbt_scd_id is a string because of the hashing function used. dbt_scd_id is a combo of unique_key + updated_at for timestamp strategy (essentially creates a surrogate key).

Some folks want dbt_scd_id to instead be an integer for better performance.

If we swapped to an integer, we’d have to use a hashing function that outputted an integer instead of a string.

  • integers collide a lot more easily than a string
  • if we get a collision, we get unintended behavior (will fail silently!)

Because of the risk of collision, we wouldn't want to make this the default for all users.

Instead, what if we had a config that allowed you to control the hashing function used when generating dbt_scd_id?

Describe alternatives you've considered

Creating a custom materialization to override the outputs from generate_surrogate_key

@graciegoheen graciegoheen added enhancement New feature or request snapshots Issues related to dbt's snapshot functionality labels Jun 12, 2024
@graciegoheen graciegoheen changed the title [Feature] generate dbt_scd_id as an integer column instead of a string for performance improvements [Feature] option to generate dbt_scd_id as an integer column instead of a string for performance improvements Jun 12, 2024
@dbeatty10 dbeatty10 removed the triage label Jun 12, 2024
@graciegoheen
Copy link
Contributor Author

slightly related issue -> dbt-labs/dbt-adapters#82

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

No branches or pull requests

2 participants