Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-469] Combine --select/--exclude with --selector when used together? #5009

Open
1 task done
solomonshorser opened this issue Apr 7, 2022 · 21 comments · May be fixed by #7101
Open
1 task done

[CT-469] Combine --select/--exclude with --selector when used together? #5009

solomonshorser opened this issue Apr 7, 2022 · 21 comments · May be fixed by #7101
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes

Comments

@solomonshorser
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I try to test my models like this:

% dbt test --selector staging --exclude stg_client

However, tests that apply to stg_client (a staging model) are still executed.

Expected Behavior

I expected the tests that relate to the model stg_client to be skipped, since I specified --exclude stg_client.

Steps To Reproduce

  1. Create a YAML selector that includes several models with tests.
  2. Try to test with the selector, but use --exclude to exclude a single model from testing.

Relevant log output

No response

Environment

- OS: Mac OS 11.6.5
- Python: 3.9.10
- dbt: 1.0.3

What database are you using dbt with?

bigquery

Additional Context

No response

@solomonshorser solomonshorser added bug Something isn't working triage labels Apr 7, 2022
@github-actions github-actions bot changed the title [Bug] --exclude is ignored when --selector is used [CT-469] [Bug] --exclude is ignored when --selector is used Apr 7, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 8, 2022

Hey @solomonshorser, the short answer is that --selector takes precedence, and --exclude is ignored.

This has come up before: dbt-labs/docs.getdbt.com#803 (originally transferred from this repo). The solution proposed then was to update our docs. It's there now, but tucked away in a note at the very bottom of https://docs.getdbt.com/reference/node-selection/syntax:

Note that when you're using --selector, most other flags (namely --select and --exclude) will be ignored.

This could definitely be clearer! Do you think the right answer looks like:

@jtcohen6 jtcohen6 added node selection Functionality and syntax for selecting DAG nodes Team:Execution and removed triage bug Something isn't working labels Apr 8, 2022
@jtcohen6 jtcohen6 changed the title [CT-469] [Bug] --exclude is ignored when --selector is used [CT-469] Clarify that --exclude is ignored when --selector is used Apr 8, 2022
@solomonshorser
Copy link
Author

solomonshorser commented Apr 8, 2022

@jtcohen6 I had only read the YAML Selectors page, so some clarification there would help. A warning message might be nice too, it would have saved me some frustration. Or maybe instead of a warning, an actual hard-stop error message: "Incompatible selection criteria were given..." or something like that.

Back to my original problem: I have a selector (not a simple one) that selects a number of models. I'd like to test everything covered by that selector, but there is one particular model that has some known testing issues. I'd like to be able to test all of the models except that problematic model (until other parties are able to either confirm certain changes to the test or the underlying data). Is there any way to do this besides modifying the YAML selector itself? This exclusion is not meant to be permanent, just something I want to do on an ad-hoc basis.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 8, 2022

Makes total sense. #4827 proposes to make possible the inheritance of one selector by another. That alone should provide for a more ergonomic approach to your situation.

A bigger question, one I'm not sure I feel ready to answer: Should passing --selector alongside --select/--exclude have the effect of inheriting the selector, and then applying the selection/exclusion logic? What about passing --select/--exclude at all when there's a selector defined with default: true? Historically, our approach has been:

  • Explicitly passing --select/--exclude causes a default selector to be ignored entirely
  • Explicitly passing --selector flag causes --select/--exclude to be ignored entirely

@solomonshorser
Copy link
Author

Should passing --selector alongside --select/--exclude have the effect of inheriting the selector, and then applying the selection/exclusion logic?

That would make sense to me, though I don't know if "inherit" is the word?
I thought of it as the set of whatever was selected by --selector unioned with whatever is selected by --select then subtract whatever is excluded by --exclude.

Something like: dbt test --selector staging_models --select extra_models --exclude problematic_models I thought would have an effect like running on the set: staging_models UNION extra_models MINUS problematic_models

@jtcohen6 jtcohen6 added the enhancement New feature or request label Apr 26, 2022
@jtcohen6 jtcohen6 changed the title [CT-469] Clarify that --exclude is ignored when --selector is used [CT-469] Combine --selector/--exclude with --selector when used together? Apr 26, 2022
@jtcohen6 jtcohen6 changed the title [CT-469] Combine --selector/--exclude with --selector when used together? [CT-469] Combine --select/--exclude with --selector when used together? Apr 26, 2022
@ghost
Copy link

ghost commented Sep 7, 2022

Just encountered a case where this would be very helpful functionality. Generally we'll want a standard selector, but being able to add an additional intersection, union, or exclude would be great so that we can tweak a selector for a particular execution dynamically.

@stumelius
Copy link

I also have a use case where I'd like to use selectors in both --select and --exclude arguments. Example where I run all the marketing models but exclude any ecommerce models in the marketing DAG:

dbt run --select selector:marketing --exclude selector:ecommerce

@ghost
Copy link

ghost commented Feb 14, 2023

I think this would be a very nice behaviour to add indeed.
Another interesting use case would be to build only models from a selector that passed tests on the previous run:
dbt build --selector <my_selector> --exclude 1+result:fail --state ./target.

@jtcohen6
Copy link
Contributor

I've come around on this. Here's how I think it should work:

  • The --selector is resolved first. Then, it's "unioned" together with any criteria passed into --select &/or --exclude.
  • Default selector is ignored if --select/--exclude is passed (status quo). You need to explicitly pass both --selector and --select/--exclude to achieve the combination.

I'm going to mark this as help_wanted for any community member who'd be interested in working on it :)

@jtcohen6 jtcohen6 added the help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors label Feb 14, 2023
@acurtis-evi
Copy link

I think it would be good if the following could be done

  1. Allow selectors to be part of select/exclude logic. This includes making selectors respect + for children and parents as well as other operators
  2. Allow a way to override one or more selectors in yml from command line. --select:selector_name, --exclude:selector_name
  3. Allow command line syntax to work in yml.

I believe this would be backwards compatible but also mostly future complete as in it provides intuitive ways to accomplish whatever selection is desired through a combination of predefined and command line options.

If the command line syntax worked identically in the yml, then it could also simplify the documentation and things would just work the way the end user expects them to work.

@acurtis-evi
Copy link

acurtis-evi commented Feb 28, 2023

Also, happy to help on this. Curious if the select/exclude is represented in the same way internally as each selector?

@acurtis-evi acurtis-evi linked a pull request Mar 1, 2023 that will close this issue
6 tasks
@acurtis-evi
Copy link

#7101

Starting to work out how this could be done in the PR above, have only partially tested

@acurtis-evi
Copy link

I believe that this is ready for review through #7101

@acurtis-evi
Copy link

I did things differently than what @jtcohen6 suggested.

Basically, --select / --exclude create a new selector named "arg_selector" which can be referenced in selectors.yml. The select / exclude syntax also supports selector:<selector_name>.

Realize it is different, but I think the change in the PR basically makes the change seem seemless.

Also, unified the syntax between the yaml selectors and the command line (introducing EXCEPT).

dbt --select "selector:selector_one EXCEPT some_model+"

is the same as

dbt --select selector:selector_one  --exclude some_model+

In addition, the selector yaml syntax allows selector:... and the union/intersect,EXCEPT logic.

definition: selector:selector_one EXCEPT some_model+

@stumelius
Copy link

stumelius commented Mar 9, 2023

@acurtis-evi This would work for the use case I described. Did I understand correctly that both

  • dbt run --select selector:marketing --exclude selector:ecommerce, and
  • dbt run --select "selector:marketing EXCEPT selector:ecommerce"

would work and are interchangeable after the change?

I also have a use case where I'd like to use selectors in both --select and --exclude arguments. Example where I run all the marketing models but exclude any ecommerce models in the marketing DAG:

dbt run --select selector:marketing --exclude selector:ecommerce

@acurtis-evi
Copy link

acurtis-evi commented Mar 9, 2023

Yes, both options would work. In addition to this, the yaml selector syntax is unified with the command line syntax. The old yaml syntax is supported, but the plain description can simply be a complex string with parent, child relations, unions, intersections, and EXCEPT.

selectors:
name: some_selector
description: selector:marketing EXCEPT selector:ecommerce

and then you can do the dbt run using

dbt run --selector some_selector

I also introduced an arg_selector

dbt run --select selector:marketing --exclude selector:ecommerce

AND

dbt run --selector arg_selector --select selector:marketing --exclude selector:ecommerce

are identical and the arg_selector can be used in the yaml selectors like this

selectors:
name: some_selected_selector
description: selector:some_selector,selector:arg_selector
name: some_selector
description: selector:marketing EXCEPT selector:ecommerce

Now you can use that selector like

dbt run --selector some_selected_selector --select some_models+

and it will intersect selector:marketing EXCEPT selector:ecommerce and some_models+

The selector:some_selector allows for modifiers as well such as +1

dbt run --select selector:marketing+1 --exclude selector:ecommerce

@ahrussell
Copy link

@acurtis-evi thanks for the PR! This feature would definitely be helpful for a use-case that we have. Is there any new info on this issue?

@Battiloni
Copy link

Battiloni commented Mar 22, 2024

# ENV
dbt-snowflake==1.7.1

Found this issue while looking for more info about the selector argument - here's an interesting edge case I have below:

  • I defined a selector called $my_project_name to only check for the nodes of $my_project_name by default
    • thus, I'm able to run a command like dbt run and it will only run the models of $my_project_name
  • I also installed the package dbt_project_evaluator
  • In my project, I have a folder representing a schema which is called marts
    • dbt_project_evaluator also has a layer of models called marts

Finally, when I'm running the following commands I don't have the same result and behaviour I would have excpected

dbt ls --select models/marts
  • returns all marts models from all packages ($my_project_name and dbt_project_evaluator)
dbt ls --selector $my_project_name --select models/marts
  • returns all the nodes from $my_project_name and not using the --select argument

Expected result from both above commands: returns only marts models from $my_project_name

  • it would be the same behaviour that @acurtis-evi is talking about

I also think a combination of the arguments --select/--exclude with --selector should be implemented, there is a high risk users are not using this correctly and can wrongly select nodes from their project and installed packages

  • which IMO is a very big caveat at the moment

@djbelknapaw
Copy link

I'd love to see the ability to combine selectors + command-line selection. My use case is that I'm trying to define super-sets of models I want to run so I have a consistent definition - for example, "things I want to run hourly". Then I want to intersect that with a selector per job, for example, "data_product_x and its parents and children".

Defining additional selectors for each data product feels like overkill to me. Ideally I could run this with something like:

dbt ls --select selector:hourly,+data_product_x+

Any chance we might see at least the selector: method added? This issue's been around for a while as has the PR.

@mroy-seedbox
Copy link

@djbelknapaw: the way we're handling this right now is via YAML anchors. It's not as user friendly, but at least it works well for now.

Example:

  - name: data_product_x
    definition:
      union: &data_product_x
        - 'model1'
        - 'model2'
        - '...'
  - name: data_product_x_hourly
    definition:
      intersection:
        - union: *data_product_x
        - tag: hourly

@indy-jonesy
Copy link

indy-jonesy commented Apr 30, 2024

This was a bit of surprise to find as an issue today. Would really appreciate if this could be resolved.
We are using anchors as suggested, but its not clean. It adds overhead.

Further, it'd be great if these all didn't have to be in a single selectors.yml file, and could be broken out into separate files like schema.yml files can for maintainability.

@mroy-seedbox
Copy link

Agreed, multiple files would be great! (although that's probably a separate feature request)

Our current selector.yml is over 1000 lines....

It should be fairly simple to merge the selectors from multiple files into one single list in python (as long as all selectors are unique, otherwise raise an exception). 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants