-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore ability to utilize updated_at
for check_cols snapshots
#5077
Conversation
…sing the check_cols strategy
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Doug Beatty.
|
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
The change looks good! Can we add a test for this case so we don't accidentally break this next time? |
Will work on some test cases for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing stands out, though I'm not the most proficient in testing snapshots, so it might be a good idea to wait for one more approval 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks @dbeatty10 for the fast fix and test!
Left one question/comment to make sure I understood your test correctly.
} | ||
|
||
|
||
def test_simple_snapshot(project): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure I understand the test case correctly here.
dbt seed
would create thesnapshot_check_cols_updated_at_expected
table. All data in snapshot_check_cols_updated_at_actual
is purely generated by the logic in the snapshot.sql
.
And for the command run_dbt(["snapshot", "--vars", "{version: 3, updated_at: 2016-07-03}"])
command there's no data in snapshot table got generated because no data got modified.
This test is also designed to make sure the time for dbt_updated_at
actually has the time that we passed in through '--vars'. Which is the main change this PR is doing
This is a great test! I would recommend adding a few short comments to document the intention of each command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this excellent feedback @ChenyuLInx!
You are correct in your understanding of the test case. I've updated the description of the pull request to reflect your feedback. Or were you thinking of adding these as comments within the code itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments to explain the testing approach.
* Restore ability to configure and utilize `updated_at` for snapshots using the check_cols strategy * Changelog entry * Optional comparison of column names starting with `dbt_` * Functional test for check cols snapshots using `updated_at` * Comments to explain the test implementation (cherry picked from commit d09459c)
…) (#5126) * Restore ability to configure and utilize `updated_at` for snapshots using the check_cols strategy * Changelog entry * Optional comparison of column names starting with `dbt_` * Functional test for check cols snapshots using `updated_at` * Comments to explain the test implementation (cherry picked from commit d09459c) Co-authored-by: Doug Beatty <[email protected]>
…-labs#5077) * Restore ability to configure and utilize `updated_at` for snapshots using the check_cols strategy * Changelog entry * Optional comparison of column names starting with `dbt_` * Functional test for check cols snapshots using `updated_at` * Comments to explain the test implementation
Restore ability to configure and utilize
updated_at
for snapshots using the check_cols strategyresolves #5076
Description
#5076 describes a breaking change, and this PR converts it to a non-breaking change in the most minimal way possible.
Details
check_relations_equal
method excluded columns beginning withdbt_
, which were the exact columns we want to compare, so modification was necessarytimestamp_col
column uses dates in January of 2016 and theupdated_at
configuration uses dates in July of 2016Testing approach
dbt seed
to create the expected relation (snapshot_check_cols_updated_at_expected
)dbt snapshot
commands to create the actual relation (snapshot_check_cols_updated_at_actual
)my_snapshot.sql
contains logic that can switch between 3 different versions of the dataThe functional test is designed to make sure
dbt_updated_at
matches the timestamp expression that we passed in through--vars
.The only change within the 3rd version of the data deletes a single row. Because this test doesn't cover the
invalidate_hard_deletes=True
option, the 3rd version of the data makes no updates. However, this version is included to make it easier to test for the expected behavior when invalidating hard deletes. See "future ideas to consider" below.Pros
dbt_*
columns within the comparison of relationsinvalidate_hard_deletes=False
Cons
Future ideas to consider
dbt_
only for those tests that need it.Checklist