Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no_time_gap schema test #264

Closed
wants to merge 4 commits into from

Conversation

hundredwatt
Copy link

@hundredwatt hundredwatt commented Jul 31, 2020

This is a:

  • bug fix PR with no breaking changes (please change the base branch to main)
  • new functionality
  • a breaking change

Description & motivation

Add no_time_gap schema test

(Open to suggestions on the name 😉)

This test asserts that the largest gap between records is less than the specified interval. For example, if you have a source table that is loaded once per day, you can use this test to find any gaps larger than 1 day in the data.

Example/Usage:

version: 2

models:
  # tests that events exist in every 2 day interval in the past year
  - name: events
    tests:
      - dbt_utils.no_time_gap:
          field: created_at
          datepart: day
          interval: 2
          since: 365

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to the changelog

@hundredwatt hundredwatt requested a review from clrcrl as a code owner July 31, 2020 16:56
@hundredwatt hundredwatt changed the title No time gap no_time_gap schema test Jul 31, 2020
Copy link
Contributor

@clrcrl clrcrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW I just removed my previous comment as I realized I was thinking about completely the wrong PR!)

Copy link
Contributor

@clrcrl clrcrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of this test — for me, a clearer name would be sequential_values.

I have a few other comments about SQL style and how this is tested — I can dig into those in more detail, but I did just want to reach out first to check if you're still interested in working on this, I did let it go stale for quite a while!

Let me know, and then if you're interested I can do a more fine-grained review — otherwise, I am happy to finish this off myself :)

@hundredwatt
Copy link
Author

hundredwatt commented Dec 27, 2020

@clrcrl yes, I'm still interested. Looking forward to your review :)

By renaming to sequential_values, do you see this as useful for types other than DATETIME/TIMESTAMP?

t3.{{column_name}} < t1.{{column_name}}
{% if since %}
and t3.{{column_name}} >=
{{dbt_utils.dateadd('year', -1, dbt_utils.current_timestamp())}}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, looks like I mistakenly hardcoded this (and line 22) to 1 year ago rather than a dynamic value based on since and datepart

@clrcrl clrcrl mentioned this pull request Jan 4, 2021
11 tasks
@clrcrl
Copy link
Contributor

clrcrl commented Jan 4, 2021

I made the decision here to take the functionality of this PR and rewrite it as a sequential_values test, in PR #318.
Notably:

  • the sequential values test works for both time intervals, and integer / decimal gaps
  • It does not have the concept of a "since" parameter — instead I'd implement a "min value" test in addition to sequential values

Typically, I try my best to provide feedback on pull requests rather than redo code myself — in this case I went against my typical behavior and wrote my own code. Your code was good, however it was a matter of style — the SQL style is really different to our SQL style, and we tend to use seed files for testing rather than a sequence (which have much nicer cross-db compatibility). Given that I also felt there should be design changes to the test (i.e. making it work with both time intervals and integer / decimal intervals), the back-and-forth of reviewing was likely going to be a frustrating process for both parties as there were a ton of changes I had in mind.

I would love to work with you on a future PR — if you have another test in mind, I encourage you to open an issue and we can work on it earlier on in the process. Again, thank you for this PR — the idea behind the test is really good, and I'm excited to add its sibling to our testing suite!

@clrcrl clrcrl closed this Jan 4, 2021
@clrcrl clrcrl mentioned this pull request Apr 5, 2021
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants