Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshots with check option - allow custom date field to be used when setting dbt_date_from and dbt_date_to fields. #1844

Closed
james-the-first opened this issue Oct 21, 2019 · 2 comments · Fixed by #3376
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! snapshots Issues related to dbt's snapshot functionality

Comments

@james-the-first
Copy link

james-the-first commented Oct 21, 2019

Overview of Change

Essentially we would like to be able to supply the following config so that a snapshot with check strategy honours the updated_at field that we supply when setting the dbt_date_from/to fields.

 {{
        config(
          target_database='analytics',
          target_schema='snapshots',
          unique_key='id',
          strategy='check',
          updated_at='updated_at',
        )
    }}

We would like to use the snapshot feature to manage type 2 changes on our dimensions (products, customers etc).

We need to use check mode (not timestamp) so we can look for record attributes that change (as well as new/deleted records).

In our staging layer we have a dw_load_datetime column on every staged dimension that represents when the data was processed. This is the timestamp from Airflow and importantly it allows us to replay historic runs (e.g. reload data for the past 3 days).

Question - can we tell dbt to use this dw_load_datetime in the snapshot when setting the dbt_date_from and dbt_date_to fields?

Currently dbt uses the current time to do this. We can see you support this use case when in timestamp mode - but we weren’t sure why this couldn’t also apply in check mode?

@james-the-first james-the-first added enhancement New feature or request triage labels Oct 21, 2019
@drewbanin drewbanin added triage and removed triage labels Oct 21, 2019
@drewbanin drewbanin added snapshots Issues related to dbt's snapshot functionality good_first_issue Straightforward + self-contained changes, good for new contributors! and removed triage labels Jan 21, 2020
@PJGaetan
Copy link
Contributor

PJGaetan commented May 13, 2021

Hello, first time contributor here, I've started to look into the issue.

I feel like the only change should be here. I would get the updated_at option, the same way invalidate_hard_deletes is.

I'm not sure what kind of tests needs to be added.
I see how to add a test to check if the new option is working correctly.
But how to assert that rows have been invalidated with updated_at and not the current timestamp ?

Edit : I would be happy to open a PR to show where I'm at if it's ok with you

@jtcohen6
Copy link
Contributor

@PJGaetan I think that's right! You could replace the line you've identified with:

{% set updated_at = config.get('updated_at', snapshot_string_as_time(now)) %}

As far as testing this change, we already have a number of integration tests related to 'check' strategy snapshots in test/integration/004_simple_snapshot_test. In particular:

I think a test for this could be as simple as copying/adapting one of those. After ensuring that the source table being snapshotted contains a date/timestamp column (the existing ones may not), create a snapshot with both the 'check' strategy and updated_at config set, run it, and confirm the dbt_updated_at column value in the resulting snapshot table matches the updated_at column value from the initial table being snapshotted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants