Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use test materialization when executing ✨generic✨ tests #3192

Closed
kwigley opened this issue Mar 24, 2021 · 2 comments · Fixed by #3286
Closed

Use test materialization when executing ✨generic✨ tests #3192

kwigley opened this issue Mar 24, 2021 · 2 comments · Fixed by #3286
Assignees
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request

Comments

@kwigley
Copy link
Contributor

kwigley commented Mar 24, 2021

Current Behavior

generic (aka schema) tests are currently executed using the python api directly.
https:/fishtown-analytics/dbt/blob/77c10713a325d2bee91d1822951ce5d91ccc3278/core/dbt/task/test.py#L84-L85

Desired Behavior

Schema tests should be executed and handled by the same test materialization used for data tests. The goal here is to maintain existing behavior handling results (simply interpreting number of rows returned by count(*) for the test)

Things to consider

Returning results for most schema tests is straight forward, we can simply remove count(*) and return the results of the query fulfilling the test, the materialization will take care of the rest. In some cases, this is not straight forward for tests that are simple pass/fail and do not necessarily return results.

Example:
https:/fishtown-analytics/dbt/blob/749f87397ec1e0a270b2e09bd8dbeb71862fdb81/test/integration/005_simple_seed_test/macros/schema_test.sql

We will have to create a way to interpret different types of results as part of this.

@kwigley kwigley added enhancement New feature or request dbt tests Issues related to built-in dbt testing functionality labels Mar 24, 2021
@kwigley kwigley changed the title Use test materialization when executing schema tests Use test materialization when executing ✨generic✨ tests Mar 24, 2021
@kwigley kwigley self-assigned this Apr 8, 2021
@ezequielberto
Copy link

ezequielberto commented Jul 22, 2021

Hello!
I think one of the issues with this change is when one defines a test with threshold. For example, I had the following custom test macro before the update:

{% macro test_relationships(model, to, field) %}

{% set column_name = kwargs.get('column_name', kwargs.get('from')) %}
{% set tolerate_errors = kwargs.get('tolerate_errors', 0) %}

with validation_errors as (
    select
        left_.id as left_id,
        right_.id as right_id
    from (select {{ column_name }} as id from {{ model }}) as left_
        left join (select {{ field }} as id from {{ to }}) as right_
            on left_.id = right_.id
    where left_.id is not null
        and right_.id is null
),
count_errors as (
    select count(*) as n_errors
    from validation_errors
)
select
    case when n_errors <= {{ tolerate_errors }}
      then 0
      else n_errors
    end as result
from count_errors

{% endmacro %}

I want it to fail if I have more than n errors, and to know how many errors they were. It was easy before, since we retrieved the value of count(*).

We have several tests this way, so actually we'd rather stay with the old testing mode. Is there a way of choosing the testing method somewhere (e.g.: dbt_project.yml)?

@jtcohen6
Copy link
Contributor

@ezequielberto I'm hopeful that a migration to the new testing mechanisms in v0.20 is smoother than it may first appear. In fact, I think you have some options.

  1. Leverage new error_if + warn_if configs: The behavior you're after with tolerate_errors is now available via built-in test configs. This could be as simple as:
with validation_errors as (
    select
        left_.id as left_id,
        right_.id as right_id
    from (select {{ column_name }} as id from {{ model }}) as left_
        left join (select {{ field }} as id from {{ to }}) as right_
            on left_.id = right_.id
    where left_.id is not null
        and right_.id is null
)
select * from validation_errors

Then replace tolerate_errors with error_if and/or warn_if in the test definition:

relationships:
  to: ...
  field: ...
  error_if: 10
  warn_if: 50
  1. Light rewrite just to get this working: The test defined above is perfectly workable up until the last CTE, where it returns numeric 0 or 1 in a case when statement. Instead, you could move the case when comparison to a where filter:
select * from count_errors
where n_errors > {{ tolerate_errors }}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants