Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1436] Treat warnings as errors for specific events, based on user configuration #6165

Closed
jtcohen6 opened this issue Oct 27, 2022 · 17 comments · Fixed by #6520
Closed

[CT-1436] Treat warnings as errors for specific events, based on user configuration #6165

jtcohen6 opened this issue Oct 27, 2022 · 17 comments · Fixed by #6520
Assignees
Labels
cli enhancement New feature or request logging
Milestone

Comments

@jtcohen6
Copy link
Contributor

Today, --warn-error is a simple on/off switch that turns all warnings into errors. We've gotten multiple requests (#4383, #6080) from users who want finer-grained control for raising errors on specific types of warnings.

We should allow users to configure exactly which event classes they want to raise as exceptions instead of warnings. This will be enabled by #6064, which will turn all warn_or_error call sites into structured events.

Where should this configuration live? It feels like "runtime" config, rather than something that goes in project code. But it might be awkward to pass as a CLI flag or env var. The best places feels like the "user config" which currently lives in profiles.yml.

config:
  warn_error: true     # all of them

config:
  warn_error_events:  # scoped down to specific events
    - NoNodesForSelectionCriteria
    - UnusedResourceConfigPath
def warn_or_error(event, node):
    # pseudo code!
    if flags.WARN_ERROR or type(event) in flags.WARN_ERROR_EVENTS:
        from dbt.exceptions import raise_compiler_error
        raise_compiler_error(scrub_secrets(event.info.msg, env_secrets()), node)
    else:
        fire_event(event)
@github-actions github-actions bot changed the title Treat warnings as errors for specific events, based on user configuration [CT-1436] Treat warnings as errors for specific events, based on user configuration Oct 27, 2022
@emmyoop
Copy link
Member

emmyoop commented Nov 3, 2022

We need to define how we want this to behave a bit more concretely by allowing include or exclude of specific warn events.

Error for all events except the ones listed

config:
  warn_error: true     # all of them

config:
  warn_error_override:
    - NoNodesForSelectionCriteria
    - UnusedResourceConfigPath

Warn for all events except the one listed

config:
  warn_error: false     # none of them (default)

config:
  warn_error_override:
    - NoNodesForSelectionCriteria
    - UnusedResourceConfigPath

Question: Should there be separate configs for what to include/exclude from the list based on if warn_error is true/false or a single config that just swaps including or excluding specific events.

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Nov 3, 2022

I like the framing of this as _override!

@emmyoop
Copy link
Member

emmyoop commented Nov 4, 2022

As part of this work, we should remove warn_or_error entirely to make the warn_error config work for all warnings, not just a subset.

This will involve changing all callsites for warn_or_error to fire_event. dbt-snowflake uses warn_or_error in connections.py.

Note: This could be a breaking change for community adapters if they use warn_or_error.

@github-christophe-oudar

It would be great to be able to scope the related configuration so that in a large project you could keep warnings on a project/directory/package while raising errors on others.

It would look like:

models:
 my_project:
    team1:
      warn_error_override:
        include:
          - NoNodesForSelectionCriteria
          - UnusedResourceConfigPath

include / exclude would allow simple declaration for those site specific configurations.

@jtcohen6
Copy link
Contributor Author

@github-christophe-oudar Thanks for this! After chatting about it with the group, it would be significantly trickier to implement this at the level of the per-project / per-model config, rather than a consistent value for the entire invocation. We're planning to implement the simpler version first, and we should open a new ticket to represent the more granular extension.

After further discussion, we're also not thrilled about having two configs for this (warn_error + warn_error_override), where the behavior of the second config changes based on the behavior of the first. Other options discussed:

warn_error: true | false | [dict??] | [list??]

# raise errors for any/all types of warnings
warn_error: true

# raise errors for ZERO types of warnings
warn_error: false

# raise errors for any/all types EXCEPT these two
warn_error:
  default: true
  except:
    - NoNodesForSelectionCriteria
    - UnusedResourceConfigPath 

# ONLY raise errors for these two types
warn_error:
  default: false
  except:
    - NoNodesForSelectionCriteria
    - UnusedResourceConfigPath

Thoughts:

  • I tried writing the above using prefix decorators (+ meaning "add," - meaning "remove"), but I wasn't happy with it.
  • Should except (or overrides) have a different name for "except add these" versus "except remove these"?
  • We could support a sugarier version of the last option, that matches the original proposal in this issue (while still holding onto us the full optionality of the default/except):
# same as 'default: false' + 'except' list
warn_error:
  - NoNodesForSelectionCriteria
  - UnusedResourceConfigPath

@github-christophe-oudar

I agree that supporting a global level should cover most use cases.

I agree that playing prefix decorator would be pretty confusing inside a YAML file.
default doesn't feel very explicit though.
Yet it's hard to find a better wording, maybe any_warning: true?

@leahwicz
Copy link
Contributor

@nathaniel-may comment- Choose what to be warned on and then raise warnings to errors

What is the scenario when there is a mix of warnings and errors? Warning for config missing or isn't there.

@jtcohen6
Copy link
Contributor Author

My understanding of Nate's proposal:

  • Flag specific warnings to be ignored
  • warn_error is still a simple on/off, for all non-ignored warnings
  • We'd want to disambiguate between warnings in the sense of "bad configuration," and warnings in the sense of tests + source freshness checks with a certain status result

The appeal of that approach is it's more in keeping with the standard practice of other compilers. The downside is losing some of the flexibility that the more-complex configuration (proposed in this issue) would enable.

@emmyoop
Copy link
Member

emmyoop commented Nov 18, 2022

Looking into the code I would say we actually have three types of warnings: deprecations, bad configs and tests + source freshness checks. Both deprecations and configs are already implemented differently since they are both utilizing warn_or_error where tests and source configs are setting the result status based on if the warn-error flag is set but do not use warn_or_error.

  1. Deprecations - it seems safe to never allow these to be treated as errors.
  2. Bad Configs - We have had community members asking to be able to turn specific warnings of this class into errors in the past(linked in original issue body). The big question is if this is something we want to support (I think yes) and to what extent/customizability?
  3. tests + source freshness checks - To allow this to be triggered to fail, it would just require using a separate flag since it's really one type of warning, just a different type. The warn-error flag could just get deprecated at some point. The problem here would be clearly defining what wins (or throwing an error when old and new flags are used within the same project)

Looking at other compilers:
C# covers its bases with

TreatWarningsAsErrors / -warnaserror: Treat all warnings as errors
WarningsAsErrors / -warnaserror: Treat one or more warnings as errors
WarningsNotAsErrors / -warnnotaserror: Treat one or more warnings not as errors
NoWarn / -nowarn: Set a list of disabled warnings.

GCC actually does something similar where you can

  1. Make all warnings errors
  2. Make a specified warning an error
  3. Specify a warning to not become an error even when you've already indicated all warnings should be errors

Making a note to not lose it: mypy does something similar with error codes (disable_error_codes and enable_error_codes) but is not exactly the same as what we're trying to do. The interesting thing is that it's two separate configs and they also make the executive decision that (enable_error_codes)... will override disabled error codes from the disable_error_code option. So if both are filled it's well documented what wins while also having explicit fields.

@MichelleArk
Copy link
Contributor

MichelleArk commented Dec 6, 2022

A functionally similar configuration syntax was recently proposed in the dbt should know more about semantic information issue. It turns out that entity definitions also need to provide an overall true/false option, plus some additional include/exclude modifications.

Implementing the same configuration syntax for warn_error would need some custom validation (expecting valid exception class names, not column names), but would have the benefit of a consistent experience with what's being proposed for entities. Below are a few examples of the syntax applied to the warn_error config:

# all warnings are treated as exceptions, same as warn_error: true
warn_error: 
  include: *  

# all warnings are treated as exceptions except NoNodesForSelectionCriteria, UnusedResourceConfigPath
warn_error: 
  include: * 
  exclude: 
    - NoNodesForSelectionCriteria
    - UnusedResourceConfigPath 

# only NoNodesForSelectionCriteria, UnusedResourceConfigPath are treated as errors
warn_error: 
  include: 
    - NoNodesForSelectionCriteria
    - UnusedResourceConfigPath 

# invalid - exclude only supported when include is '*'
warn_error: 
  include:
    - ThisEvent
    - ThatEvent 
  exclude: 
    - NoNodesForSelectionCriteria
    - UnusedResourceConfigPath 

# invalid
warn_error: 
  exclude: * 

@jtcohen6 jtcohen6 added this to the v1.4 milestone Dec 6, 2022
@callum-mcdata
Copy link
Contributor

An observation - it appears yml doesn't support non-quoted special characters such as *. See link here and explanation from Stack Overflow.

YAML doesn't require quoting most strings but you'll want to quote special characters if they fall within a certain list of characters.

This implies that the syntax will be the slightly less clean:

warn_error:
  include: '*'

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jan 2, 2023

c'est comme ça

We could support "all" as an alias:

warn_error:
  include: all # same as "*"

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jan 4, 2023

One thing to check during implementation: The spec looks great as yaml, but dbt Cloud users don't have the ability to write/edit profiles.yml, which is where UserConfig lives right now.

It will need to be possible to pass these data structures in as an env var (DBT_WARN_ERROR) or CLI flag (--warn-error) as well (even if it's super ugly to look at).

Something like:

$ DBT_WARN_ERROR='{"include": "*", "exclude": ["NoNodesForSelectionCriteria", "UnusedResourceConfigPath"}' dbt run
$ dbt --warn-error '{"include": "*", "exclude": ["NoNodesForSelectionCriteria", "UnusedResourceConfigPath"}' run

@MichelleArk
Copy link
Contributor

MichelleArk commented Jan 4, 2023

That could work nicely! One edge case I can think of would be for entity definitions where a column name is actually called 'all' - how would dbt disambiguate between a column named 'all' vs 'include all columns'? I don't think this is an issue for the warn_error configuration since dbt-core provides the exception names.

One way to disambiguate between a column named 'all' vs the directive to include all column names without complex validation would be based on the presence of 'all' as a string vs a list item. For example:

# include all columns
incude: all 

vs

# column named 'all'
include:
  - all 

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jan 4, 2023

based on the presence of 'all' as a string vs a list item

my thought exactly

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jan 4, 2023

& the good old fashioned --warn-error bool will also want to keep working, with the same effect as "include all":

$ dbt --warn-error run
$ dbt --warn-error '{"include": "all"}' run

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jan 4, 2023

(summary from follow-up convo w @MichelleArk)

We can't have one CLI flag that's both a boolean and also a dict. argparse isn't happy with that, click isn't happy with that, and if we're being honest with ourselves, we shouldn't be too happy about doing it either.

Let's make a new config (+ flag/env var), call it warn_error_options, data structure is always a dictionary that looks like {"include": str | list, "exclude": list}. This will be distinct from and mutually exclusive with the existing boolean config/flag/var named warn_error. Note: This means it would not be possible to supply warn_error_options in UserConfig and override with plain old --warn-error in env var / CLI — you have to override like with like.

$ DBT_WARN_ERROR_OPTIONS='{"include": "*", "exclude": ["NoNodesForSelectionCriteria", "UnusedResourceConfigPath"}' dbt run
$ dbt --warn-error-options '{"include": "*", "exclude": ["NoNodesForSelectionCriteria", "UnusedResourceConfigPath"}' run

Is it a bit ugly? Yes. But it's a single variable, you only need to set it once, you can pop it in a yaml/json validator first, and ultimately, this is a capability intended for fairly advanced users. More important that it's possible on the CLI, than it being ergonomic & delightful. Ultimately, we should also move UserConfig somewhere else (#6266) such that dbt Cloud users can actually make use of it, without needing to modify profiles.yml directly.

These two are still identical:

$ dbt --warn-error run
$ dbt --warn-error-options '{"include": "all"}' run

Theoretically these should also be equivalent:

$ dbt --no-warn-error run
$ dbt --warn-error-options '{"include": []}' run

Up to us if we eventually (dbt v2) deprecate the existing --warn-error config. It's a lot sugarier for sure. No deprecation warning for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli enhancement New feature or request logging
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants