Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new parsing method LoadMode.DBT_LS_FILE #733

Merged
merged 27 commits into from
Dec 14, 2023

Conversation

woogakoki
Copy link
Contributor

@woogakoki woogakoki commented Dec 1, 2023

Description

Adds a new load method which is in between manifest and dbt_ls. Goal of this is to have a dbt ls file with the output prepared and only use the parsing logic of dbt_ls load method. This should increase performance compared to using dbt_ls.
However this method needs one output file for each taskgroup build.

Breaking Change?

No breaking changes, it just adds a new load method on top but leaves the rest as is.

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

@woogakoki woogakoki requested a review from a team as a code owner December 1, 2023 08:52
@woogakoki woogakoki requested a review from a team December 1, 2023 08:52
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 1, 2023
Copy link

netlify bot commented Dec 1, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 5943619
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/657ab7e1e103910008008a31

@dosubot dosubot bot added area:parsing Related to parsing DAG/DBT improvement, issues, or fixes dbt:parse Primarily related to dbt parse command or functionality parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc labels Dec 1, 2023
Copy link

codecov bot commented Dec 1, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4369987) 93.18% compared to head (5943619) 93.23%.
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #733      +/-   ##
==========================================
+ Coverage   93.18%   93.23%   +0.04%     
==========================================
  Files          55       55              
  Lines        2451     2469      +18     
==========================================
+ Hits         2284     2302      +18     
  Misses        167      167              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @woogakoki, thanks a lot for the contribution. This looks great!

Your change will help with the performance for users who like the precision of the DBT_LS load method but don't want to have the pay the cost of running it during DAG parsing/task execution.

Three main feedback points:

  1. Please, could you address the pre-commit errors
  2. The coverage checks are also failing, if you could increase the test coverage by addressing the PR comments created by Codecov, it would be amazing!
  3. Please, could you change one of the existing example DAGs in dev/dags to use this parsing method? It could perhaps be this one: https:/astronomer/astronomer-cosmos/blob/main/dev/dags/user_defined_profile.py

I left some minor comments inline.

@tatiana tatiana added this to the 1.3.0 milestone Dec 4, 2023
@tatiana tatiana changed the title Feat/add new loadmode Add new parsing method LoadMode.DBT_LS_FILE Dec 5, 2023
@tatiana
Copy link
Collaborator

tatiana commented Dec 7, 2023

Hey @woogakoki, we're super close to getting this merged!
Once you address the code coverage comments, we can release it as part of a 1.3 alpha.

@woogakoki
Copy link
Contributor Author

woogakoki commented Dec 7, 2023

Hey @woogakoki, we're super close to getting this merged! Once you address the code coverage comments, we can release it as part of a 1.3 alpha.

Hi @tatiana, thanks for the heads up. Will try to fix it over the course of today then!

@woogakoki
Copy link
Contributor Author

@woogakoki Just added some last comments before we can merge this feature. Can't wait to try this out. Do you by any chance have numbers on the performance improvement by using this method?

Unfortunatly not as I only tested it locally

@woogakoki
Copy link
Contributor Author

@tatiana The Type-Check test is failing due to some issue in cosmos/operators/kubernetes.py. Is that known and can be ignored?

@tatiana
Copy link
Collaborator

tatiana commented Dec 13, 2023

@woogakoki Thanks for addressing the feedback. Yes, the issue with the type check was solved by @jbandoro in #766. Once the checks pass, I'll merge your PR!

@tatiana tatiana added the lgtm This PR has been approved by a maintainer label Dec 13, 2023
@tatiana
Copy link
Collaborator

tatiana commented Dec 13, 2023

@woogakoki when you get a chance, please, check the failing tests:

FAILED tests/dbt/test_graph.py::test_load_via_dbt_ls_file - cosmos.dbt.graph.CosmosLoadDbtException: Unable to load dbt ls file without RenderConfig.project_path
===== 1 failed, 22 passed, 3 skipped, 330 deselected in 459.96s (0:07:39) ======

Also, once you rebase, we should see the type check passing 🤞

@tatiana tatiana merged commit 82e8db9 into astronomer:main Dec 14, 2023
42 checks passed
@tatiana
Copy link
Collaborator

tatiana commented Dec 14, 2023

Amazing work, @woogakoki , thank you very much for getting this work through

@woogakoki
Copy link
Contributor Author

Amazing work, @woogakoki , thank you very much for getting this work through

Thank you for the support @tatiana! Super happy it went through this year already.

@tatiana tatiana mentioned this pull request Jan 4, 2024
tatiana added a commit that referenced this pull request Jan 4, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in #733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in #728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in #540, #688 and #741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in #608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in #649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in #616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in #648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in #623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in #735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in #768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in #736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in #730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in #663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in #758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in #759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in #772

**Others**

* Replace flake8 for Ruff by @joppevos in #743
* Reduce code complexity to 8 by @joppevos in #738
* Speed up integration tests by @jbandoro in #732
* Fix README quickstart link in by @RNHTTR in #776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in #766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in #757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
#731 and #779
* pre-commit updates in #775, #770, #762
ykuc pushed a commit to ykuc/astronomer-cosmos that referenced this pull request Jan 11, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in astronomer#733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in astronomer#728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
astronomer#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in astronomer#540, astronomer#688 and astronomer#741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in astronomer#608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in astronomer#649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in astronomer#616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in astronomer#648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in astronomer#623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in astronomer#735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in astronomer#768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in astronomer#736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in astronomer#730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in astronomer#663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in astronomer#758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in astronomer#759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in astronomer#772

**Others**

* Replace flake8 for Ruff by @joppevos in astronomer#743
* Reduce code complexity to 8 by @joppevos in astronomer#738
* Speed up integration tests by @jbandoro in astronomer#732
* Fix README quickstart link in by @RNHTTR in astronomer#776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
astronomer#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in astronomer#766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in astronomer#757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
astronomer#731 and astronomer#779
* pre-commit updates in astronomer#775, astronomer#770, astronomer#762
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
Adds a new load method, which is in between manifest and dbt_ls. This aims to have a dbt ls file with the output prepared and only use the
parsing logic of the dbt_ls load method. This should increase performance
compared to using dbt_ls.

However, this method needs one output file for each taskgroup build.
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in astronomer#733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in astronomer#728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
astronomer#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in astronomer#540, astronomer#688 and astronomer#741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in astronomer#608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in astronomer#649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in astronomer#616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in astronomer#648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in astronomer#623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in astronomer#735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in astronomer#768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in astronomer#736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in astronomer#730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in astronomer#663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in astronomer#758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in astronomer#759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in astronomer#772

**Others**

* Replace flake8 for Ruff by @joppevos in astronomer#743
* Reduce code complexity to 8 by @joppevos in astronomer#738
* Speed up integration tests by @jbandoro in astronomer#732
* Fix README quickstart link in by @RNHTTR in astronomer#776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
astronomer#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in astronomer#766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in astronomer#757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
astronomer#731 and astronomer#779
* pre-commit updates in astronomer#775, astronomer#770, astronomer#762
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:parsing Related to parsing DAG/DBT improvement, issues, or fixes dbt:parse Primarily related to dbt parse command or functionality lgtm This PR has been approved by a maintainer parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants