Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add on_kill to Databricks Workflow Operator #42115

Merged

Conversation

R7L208
Copy link
Contributor

@R7L208 R7L208 commented Sep 9, 2024

The Databricks Provider did not implement on_kill to cancel tasks generated by _CreateDatabricksWorkflowOperator. This led to data quality issues, where Airflow would report a cancellation due to timeout; however, the corresponding workflow task would continue to run on Databricks.

This PR implements on_kill for _CreateDatabricksWorkflowOperator.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link

boring-cyborg bot commented Sep 9, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https:/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@R7L208 R7L208 force-pushed the lorin/databricks-operators-on_kill-override branch 5 times, most recently from 70f8c96 to c3f1e9b Compare September 13, 2024 15:48
@R7L208 R7L208 force-pushed the lorin/databricks-operators-on_kill-override branch from c3f1e9b to b80e162 Compare September 18, 2024 19:31
@R7L208 R7L208 force-pushed the lorin/databricks-operators-on_kill-override branch 2 times, most recently from 8a86b3b to ac988bb Compare September 27, 2024 16:38
@R7L208 R7L208 force-pushed the lorin/databricks-operators-on_kill-override branch from ac988bb to 532feb3 Compare October 1, 2024 23:55
@R7L208
Copy link
Contributor Author

R7L208 commented Oct 1, 2024

hey @pankajkoti - Any ETA on when you'd be able to review this PR? 🙏

Copy link
Member

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @R7L208 Apologies, I missed reviewing it so far.

I am happy to review the changes for databricks_workflow.py but not so comfortable yet reviewing the changes in databricks_sql.py (operators & hooks) as I do not have enough expertise on the changes made there. Would you please to separate out the two changes(move out databricks_sql.py related changes to another PR) and then we can invite some expert to review the other PR?

@R7L208 R7L208 force-pushed the lorin/databricks-operators-on_kill-override branch from 532feb3 to 10ceb61 Compare October 2, 2024 14:28
@R7L208
Copy link
Contributor Author

R7L208 commented Oct 2, 2024

Opened #42668

@pankajkoti @eladkal - can you please assign someone to review?

@R7L208 R7L208 changed the title Add on_kill or equivalent to Databricks Operators/Hooks to cancel timed out queries Add on_kill to Databricks Workflow Operator to cancel timed out queries Oct 2, 2024
@R7L208 R7L208 force-pushed the lorin/databricks-operators-on_kill-override branch from d45c9d9 to bf5237a Compare October 2, 2024 14:45
@pankajkoti pankajkoti requested a review from Lee-W October 2, 2024 15:08
@pankajkoti
Copy link
Member

@R7L208 Can you please update the PR description based on the altered scope of this PR?

@R7L208 R7L208 changed the title Add on_kill to Databricks Workflow Operator to cancel timed out queries Add on_kill to Databricks Workflow Operator Oct 2, 2024
@R7L208
Copy link
Contributor Author

R7L208 commented Oct 2, 2024

@pankajkoti - it's been updated

@R7L208 R7L208 force-pushed the lorin/databricks-operators-on_kill-override branch from bf5237a to bf0555b Compare October 2, 2024 17:36
@pankajkoti
Copy link
Member

pankajkoti commented Oct 2, 2024

@pankajkoti - it's been updated

@R7L208 it's not. It mentions about DatabricksSqlHook queries ("SQL queries submitted by DatabricksSqlHook") & using threading ("uses threading to cancel SQL queries submitted by DatabricksSqlHook.run()") which we no longer have in this PR. Could you please re-read the description and update to what's limited to the scope of this PR.

Also, _CreateDatabricksWorkflowOperator does not rely on DatabricksSqlHook, but leverages DatabricksHook just in case you missed checking that :)

@R7L208
Copy link
Contributor Author

R7L208 commented Oct 2, 2024

@pankajkoti - Apologies! I read PR title instead of PR description 🤦

PR description is now updated

@Lee-W Lee-W merged commit 5d51bee into apache:main Oct 3, 2024
56 checks passed
Copy link

boring-cyborg bot commented Oct 3, 2024

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

joaopamaral pushed a commit to joaopamaral/airflow that referenced this pull request Oct 21, 2024
* add on_kill override to databricks workflow operator

* on_kill equivalent for DatabricksSqlOperator

* add tests for create_timeout_thread

* add note for on_kill in DatabricksCopyIntoOperator

* chore: static checks

* remove changes for databricks_sql.py for PR isolated to databricks_workflows.py

---------

Co-authored-by: Lorin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants