Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to select using graph-operators when using LoadMode.CUSTOM or LoadMode.DBT_MANIFEST #728

Merged
merged 15 commits into from
Dec 5, 2023

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Nov 30, 2023

Add support for the following when using LoadMode.CUSTOM or LoadMode.DBT_MANIFEST:

  • Support selection of model by name
  • Support the selection of models by name & their children (with or without degrees)
  • Support the selection of models by name & their parents (with or without degrees)
  • Support intersections and unions involving graph selectors (with or without other supported selectors, eg. tags)

Examples of select/exclusion statements that now work regardless of the LoadMode being used:

model_a
+model_b
model_c+
+model_d+
2+model_e
model_f+3
model_f+,tag:nightly

Related dbt documentation:
https://docs.getdbt.com/reference/node-selection/graph-operators
https://docs.getdbt.com/reference/node-selection/set-operators

Limitations:

  • The at operator is not supported yet (@)
  • If users opt to use graph selector, it will increase the DAG parsing time and the task execution time when using LoadMode.CUSTOM or LoadMode.DBT_MANIFEST

This PR improves and extends the original implementation proposed by @tseruga in #429. Some of the changes that were introduced on top of the original PR:

  • Add support to descendants (before only precursors were supported)
  • Add support to different depths/degrees of precursors/descendants
  • Add support to the union between graph operators and graph/non-graph operators
  • Add support to the intersection between graph operators and graph/non-graph operators

Closes: #684

Co-authored-by: Tyler Seruga [email protected]

@tatiana tatiana requested a review from a team as a code owner November 30, 2023 12:54
@tatiana tatiana requested a review from a team November 30, 2023 12:54
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 30, 2023
Copy link

netlify bot commented Nov 30, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 8ad0648
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/656f07265aa38200082d7c9d

@dosubot dosubot bot added area:selector Related to selector, like DAG selector, DBT selector, etc dbt:run Primarily related to dbt run command or functionality parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc labels Nov 30, 2023
Copy link

codecov bot commented Nov 30, 2023

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (e1f34ea) 92.99% compared to head (5fe2ccc) 93.05%.
Report is 1 commits behind head on main.

❗ Current head 5fe2ccc differs from pull request most recent head 8ad0648. Consider uploading reports for the commit 8ad0648 to get more accurate results

Files Patch % Lines
cosmos/dbt/selector.py 94.33% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #728      +/-   ##
==========================================
+ Coverage   92.99%   93.05%   +0.05%     
==========================================
  Files          55       55              
  Lines        2313     2404      +91     
==========================================
+ Hits         2151     2237      +86     
- Misses        162      167       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana added this to the 1.3.0 milestone Nov 30, 2023
@tatiana tatiana added the parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing label Nov 30, 2023
Copy link
Contributor

@harels harels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor doc string comment, lgtm

cosmos/dbt/selector.py Outdated Show resolved Hide resolved
Co-authored-by: Harel Shein <[email protected]>
@tatiana tatiana merged commit 6deba22 into main Dec 5, 2023
38 of 39 checks passed
@tatiana tatiana deleted the custom-selector-upstream-model branch December 5, 2023 11:19
tatiana added a commit that referenced this pull request Dec 7, 2023
Features

* Add ProfileMapping for Vertica by @perttus in #540 and #688
* Add support for Snowflake encrypted private key environment variable by @DanMawdsleyBA in #649
* Add support to select using (some) graph operators when using LoadMode.CUSTOM and LoadMode.DBT_MANIFEST by @tatiana in #728
* Add cosmos/propagate_logs Airflow config support for disabling log pr… by @agreenburg in #648
* Add operator_args full_refresh as a templated field by @joppevos in #623
* Expose environment variables and dbt variables in ProjectConfig by @jbandoro in #735

Enhancements

* Make Pydantic an optional dependency by @pixie79 in #736
* Create a symbolic link to dbt_packages when dbt_deps is False when using LoadMode.DBT_LS by @DanMawdsleyBA in #730
* Support no profile_config for ExecutionMode.KUBERNETES and ExecutionMode.DOCKER by @MrBones757 and @tatiana in #681 and #731
* Add aws_session_token for Athena mapping by @benjamin-awd in #663

Others

* Replace flake8 for Ruff by @joppevos in #743
* Reduce code complexity to 8 by @joppevos in #738
* Update conflict matrix between Airflow and dbt versions by @tatiana in #731
* Speed up integration tests by @jbandoro in #732
@tatiana tatiana mentioned this pull request Dec 7, 2023
tatiana added a commit that referenced this pull request Dec 7, 2023
Features

* Add ProfileMapping for Vertica by @perttus in #540 and #688
* Add support for Snowflake encrypted private key environment variable by @DanMawdsleyBA in #649
* Add support to select using (some) graph operators when using LoadMode.CUSTOM and LoadMode.DBT_MANIFEST by @tatiana in #728
* Add cosmos/propagate_logs Airflow config support for disabling log pr… by @agreenburg in #648
* Add operator_args full_refresh as a templated field by @joppevos in #623
* Expose environment variables and dbt variables in ProjectConfig by @jbandoro in #735

Enhancements

* Make Pydantic an optional dependency by @pixie79 in #736
* Create a symbolic link to dbt_packages when dbt_deps is False when using LoadMode.DBT_LS by @DanMawdsleyBA in #730
* Support no profile_config for ExecutionMode.KUBERNETES and ExecutionMode.DOCKER by @MrBones757 and @tatiana in #681 and #731
* Add aws_session_token for Athena mapping by @benjamin-awd in #663

Others

* Replace flake8 for Ruff by @joppevos in #743
* Reduce code complexity to 8 by @joppevos in #738
* Update conflict matrix between Airflow and dbt versions by @tatiana in #731
* Speed up integration tests by @jbandoro in #732
jbandoro pushed a commit that referenced this pull request Dec 7, 2023
**Features**

* Add `ProfileMapping` for Snowflake encrypted private key path by
@ivanstillfront in #608
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in #649
* Add `DbtDocsGCSOperator` for uploading dbt docs to GCS by @jbandoro in
#616
* Add support to select using (some) graph operators when using
`LoadMode.CUSTOM` and `LoadMode.DBT_MANIFEST` by @tatiana in #728
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in #648
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in #623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in #735

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in #736
* Create a symbolic link to `dbt_packages` when `dbt_deps` is False when
using `LoadMode.DBT_LS` by @DanMawdsleyBA in #730
* Support no `profile_config` for `ExecutionMode.KUBERNETES` and
`ExecutionMode.DOCKER` by @MrBones757 and @tatiana in #681 and #731
* Add `aws_session_token` for Athena mapping by @benjamin-awd in #663

**Others**

* Replace flake8 for Ruff by @joppevos in #743
* Reduce code complexity to 8 by @joppevos in #738
* Update conflict matrix between Airflow and dbt versions by @tatiana in
#731
* Speed up integration tests by @jbandoro in #732
@tatiana tatiana mentioned this pull request Jan 4, 2024
tatiana added a commit that referenced this pull request Jan 4, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in #733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in #728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in #540, #688 and #741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in #608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in #649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in #616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in #648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in #623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in #735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in #768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in #736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in #730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in #663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in #758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in #759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in #772

**Others**

* Replace flake8 for Ruff by @joppevos in #743
* Reduce code complexity to 8 by @joppevos in #738
* Speed up integration tests by @jbandoro in #732
* Fix README quickstart link in by @RNHTTR in #776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in #766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in #757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
#731 and #779
* pre-commit updates in #775, #770, #762
ykuc pushed a commit to ykuc/astronomer-cosmos that referenced this pull request Jan 11, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in astronomer#733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in astronomer#728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
astronomer#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in astronomer#540, astronomer#688 and astronomer#741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in astronomer#608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in astronomer#649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in astronomer#616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in astronomer#648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in astronomer#623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in astronomer#735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in astronomer#768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in astronomer#736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in astronomer#730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in astronomer#663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in astronomer#758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in astronomer#759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in astronomer#772

**Others**

* Replace flake8 for Ruff by @joppevos in astronomer#743
* Reduce code complexity to 8 by @joppevos in astronomer#738
* Speed up integration tests by @jbandoro in astronomer#732
* Fix README quickstart link in by @RNHTTR in astronomer#776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
astronomer#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in astronomer#766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in astronomer#757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
astronomer#731 and astronomer#779
* pre-commit updates in astronomer#775, astronomer#770, astronomer#762
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
…OM` or `LoadMode.DBT_MANIFEST` (astronomer#728)

Add support for the following when using `LoadMode.CUSTOM` or
`LoadMode.DBT_MANIFEST`:

* Support selection of model by name
* Support the selection of models by name & their children (with or
without degrees)
* Support the selection of models by name & their parents (with or
without degrees)
* Support intersections and unions involving graph selectors (with or
without other supported selectors, eg. tags)

Examples of select/exclusion statements that now work regardless of the
`LoadMode` being used:

```
model_a
+model_b
model_c+
+model_d+
2+model_e
model_f+3
model_f+,tag:nightly
```

Related dbt documentation:
https://docs.getdbt.com/reference/node-selection/graph-operators
https://docs.getdbt.com/reference/node-selection/set-operators

Limitations:
* The at operator is not supported yet (`@`)
* If users opt to use graph selector, it will increase the DAG parsing
time and the task execution time when using `LoadMode.CUSTOM` or
`LoadMode.DBT_MANIFEST`


This PR improves and extends the original implementation proposed by
@tseruga in astronomer#429. Some of the changes that were introduced on top of the
original PR:
* Add support to descendants (before only precursors were supported)
* Add support to different depths/degrees of precursors/descendants
* Add support to the union between graph operators and graph/non-graph
operators
* Add support to the intersection between graph operators and
graph/non-graph operators

Closes: astronomer#684

Co-authored-by: Tyler Seruga <[email protected]>
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
**Features**

* Add `ProfileMapping` for Snowflake encrypted private key path by
@ivanstillfront in astronomer#608
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in astronomer#649
* Add `DbtDocsGCSOperator` for uploading dbt docs to GCS by @jbandoro in
astronomer#616
* Add support to select using (some) graph operators when using
`LoadMode.CUSTOM` and `LoadMode.DBT_MANIFEST` by @tatiana in astronomer#728
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in astronomer#648
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in astronomer#623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in astronomer#735

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in astronomer#736
* Create a symbolic link to `dbt_packages` when `dbt_deps` is False when
using `LoadMode.DBT_LS` by @DanMawdsleyBA in astronomer#730
* Support no `profile_config` for `ExecutionMode.KUBERNETES` and
`ExecutionMode.DOCKER` by @MrBones757 and @tatiana in astronomer#681 and astronomer#731
* Add `aws_session_token` for Athena mapping by @benjamin-awd in astronomer#663

**Others**

* Replace flake8 for Ruff by @joppevos in astronomer#743
* Reduce code complexity to 8 by @joppevos in astronomer#738
* Update conflict matrix between Airflow and dbt versions by @tatiana in
astronomer#731
* Speed up integration tests by @jbandoro in astronomer#732
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in astronomer#733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in astronomer#728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
astronomer#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in astronomer#540, astronomer#688 and astronomer#741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in astronomer#608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in astronomer#649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in astronomer#616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in astronomer#648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in astronomer#623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in astronomer#735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in astronomer#768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in astronomer#736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in astronomer#730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in astronomer#663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in astronomer#758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in astronomer#759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in astronomer#772

**Others**

* Replace flake8 for Ruff by @joppevos in astronomer#743
* Reduce code complexity to 8 by @joppevos in astronomer#738
* Speed up integration tests by @jbandoro in astronomer#732
* Fix README quickstart link in by @RNHTTR in astronomer#776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
astronomer#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in astronomer#766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in astronomer#757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
astronomer#731 and astronomer#779
* pre-commit updates in astronomer#775, astronomer#770, astronomer#762
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:selector Related to selector, like DAG selector, DBT selector, etc dbt:run Primarily related to dbt run command or functionality parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support to select using (some) graph-operators when using LoadMode.CUSTOM or LoadMode.DBT_MANIFEST
3 participants