Incorrect compilation error thrown for snapshots using check strategy and missing check_cols parameter #2787

kingfink · 2020-09-24T14:10:00Z

Describe the bug

If the check_cols parameter is missing when using the check strategy for a snapshot, a misleading compilation error is thrown:

Compilation Error in snapshot snapshot_table_name (snapshots/snapshot_table_name.sql) at path []: 'updated_at' is a required property

Steps To Reproduce

Create a snapshot with the following contents:

{% snapshot snapshot_test %}

{{
  config(
    target_database='analytics',
    target_schema='analytics_snapshots',
    unique_key='id',
    strategy='check'
  )
}}

WITH
  temp AS (
    SELECT 1 AS col
  )

SELECT
  *
FROM temp

{% endsnapshot %}

Issue the following command:

$ dbt snapshot
Running with dbt=0.18.0
Encountered an error:
Compilation Error in snapshot snapshot_test (snapshots/snapshot_test.sql)
  at path []: 'updated_at' is a required property

Expected behavior

I'd expect the compilation error to be specific to the strategy used, i.e. when using check:

$ dbt snapshot
Running with dbt=0.18.0
Encountered an error:
Compilation Error in snapshot snapshot_test (snapshots/snapshot_test.sql)
  at path []: 'check_cols' is a required property when 'check' strategy is used

Screenshots and log output

See above

System information

Which database are you using dbt with?

The output of dbt --version:

installed version: 0.18.0
   latest version: 0.18.0

Up to date!

Plugins:
  - bigquery: 0.18.0
  - snowflake: 0.18.0
  - redshift: 0.18.0
  - postgres: 0.18.0

The operating system you're using:

The output of python --version:

Python 3.7.8

Additional context

I'd love to take a stab at fixing this issue if someone can point me in the right direction 🙂

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2020-09-24T14:27:02Z

Good call, @kingfink!

I think this gets into some of the trickier code in dbt, which is where we parse files and construct type-secure dbt objects. I believe the relevant lines are here, defining TimestampSnapshotConfig and TimestampSnapshotConfig:
https:/fishtown-analytics/dbt/blob/873d76d72ce17446900e2e58593bfeaff096eb2a/core/dbt/contracts/graph/model_config.py#L569-L615

I'm thinking the issue may be that, since a snapshot config missing both updated_at and check_cols fails to meet both contracts, dbt returns the first error rather than the most relevant one. If there's a way to to read the strategy and check against only the corresponding snapshot config, I agree that'd be an improvement for user clarity.

kingfink · 2020-09-24T17:05:51Z

Thanks for the context @jtcohen6 !

Are there any other examples you know of (either in the dbt codebase or somewhere else) where I can see this kind of branching logic in action?

read the strategy and check against only the corresponding snapshot config

I'm guessing trying to determine the relevance similar to this function is not what we want to do here?

https:/fishtown-analytics/dbt/blob/873d76d72ce17446900e2e58593bfeaff096eb2a/core/dbt/contracts/graph/model_config.py#L463-L472

jtcohen6 · 2020-09-24T18:14:15Z

_relevance_without_strategy sure feels like a promising start...

@drewbanin Any chance you could give a more specific pointer here?

drewbanin · 2020-09-24T22:39:24Z

So - Snapshot config validation is kind of tricky! A snapshot can either be:

configured with the check strategy and should be validated as such
configured with the timestamp strategy and should be validated as such

A snapshot config can also be configured with a user-defined strategy, which doesn't seem to happen too often in practice. I think that is where some of the misdirection in this part of the codebase comes from. When we see a non-compliant snapshot config, it's not obvious if it's errant or if it's just a configuration for a user-configured strategy.

Rather than hacking this _relevance_without_strategy method, I'd probably just add some logic around here which explicitly checks the data.config.strategy value and uses a specific validator for the CheckSnapshotConfig or the TimestampSnapshotConfig class as needed.

https:/fishtown-analytics/dbt/blob/873d76d72ce17446900e2e58593bfeaff096eb2a/core/dbt/contracts/graph/model_config.py#L476-L491

I got some sample code working locally if it's helpful @kingfink otherwise this should be a good starting point!

kingfink · 2020-09-25T16:40:00Z

OK I took a stab at it below, is this along the lines of what you're thinking?

@dataclass
class SnapshotWrapper(JsonSchemaMixin):
    """This is a little wrapper to let us serialize/deserialize the
    SnapshotVariants union.
    """
    config: SnapshotVariants  # mypy: ignore

    @classmethod
    def validate(cls, data: Any):
        if data['config'].get('strategy') == 'check':
            schema = _validate_schema(CheckSnapshotConfig)
        elif data['config'].get('strategy') == 'timestamp':
            schema = _validate_schema(TimestampSnapshotConfig)
        else:
            schema = _validate_schema(cls)

        validator = jsonschema.Draft7Validator(schema)

        error = jsonschema.exceptions.best_match(
            validator.iter_errors(data['config'] if data['config'].get('strategy') in ['check', 'timestamp'] else data),
            key=_relevance_without_strategy,
        )

        if error is not None:
            raise ValidationError.create_from(error) from error

Thank you so much for helping me out with this @drewbanin !

drewbanin · 2020-09-25T17:26:48Z

@kingfink no joke that is actually almost exactly the diff that I made locally while testing this one :)

I have some minor thoughts about this, but they're totally the kind of thing we could iterate on in the PR. At a high-level,

we should use data.get('config', {}).get('strategy') to make sure that if data is malformed, we don't raise an unhelpful KeyError (this is validation code, after all!)
it probably makes more sense to assign a different value for data in each of these branches, eg:

    def validate(cls, data: Any):
        config = data.get('config', {})
        if config.get('strategy') == 'check':
            schema = _validate_schema(CheckSnapshotConfig)
            to_validate = config
        ....
        error = jsonschema.exceptions.best_match(
            validator.iter_errors(to_validate),
            key=_relevance_without_strategy,
        )

Just my 2 cents though - happy to litigate them in a PR if you're interested in contributing a fix for this issue :D

kingfink · 2020-09-25T21:01:17Z

@drewbanin agree with you :)

Just opened a PR (and submitted the CLA...not sure if that resolves itself)

kingfink added bug Something isn't working triage labels Sep 24, 2020

jtcohen6 removed the triage label Sep 24, 2020

kingfink mentioned this issue Sep 25, 2020

Fix snapshot compilation error #2791

Merged

4 tasks

jtcohen6 closed this as completed in #2791 Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect compilation error thrown for snapshots using check strategy and missing check_cols parameter #2787

Incorrect compilation error thrown for snapshots using check strategy and missing check_cols parameter #2787

kingfink commented Sep 24, 2020 •

edited

Loading

jtcohen6 commented Sep 24, 2020

kingfink commented Sep 24, 2020

jtcohen6 commented Sep 24, 2020

drewbanin commented Sep 24, 2020

kingfink commented Sep 25, 2020

drewbanin commented Sep 25, 2020

kingfink commented Sep 25, 2020

Incorrect compilation error thrown for snapshots using check strategy and missing check_cols parameter #2787

Incorrect compilation error thrown for snapshots using check strategy and missing check_cols parameter #2787

Comments

kingfink commented Sep 24, 2020 • edited Loading

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

jtcohen6 commented Sep 24, 2020

kingfink commented Sep 24, 2020

jtcohen6 commented Sep 24, 2020

drewbanin commented Sep 24, 2020

kingfink commented Sep 25, 2020

drewbanin commented Sep 25, 2020

kingfink commented Sep 25, 2020

kingfink commented Sep 24, 2020 •

edited

Loading