-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DatabricksNotebookOperator
fails when task_key is longer than 100 characters
#41816
Comments
If there is a 100 limit why Databricks don't raise error when you create the notebook?
This feels more like a feature request to Databricks. |
@eladkal, That makes sense. But this case still needs to be handled on the Airflow side as well since we want a way to track the job. I have a few suggestions here. suggestions:
|
@rawwar I think we at least can do |
related to #41816 Adds a warning log to indicate failure if length of task_key>100.
related to apache#41816 Adds a warning log to indicate failure if length of task_key>100.
Hey, I've ran into the same issue today. In our case, we're using in-house DAG factory for generating DAGs from configuration files. This can result in both long dag ids and task ids, as the task ids also contains task groups names. When I try to run this dag,
I see three options here:
|
related to apache#41816 Adds a warning log to indicate failure if length of task_key>100.
Warning when it happens is already merged. Should we keep that issue open ? Or should we close it after this is addressed better - everyone? |
If we agree on an approach, I can work on this one. So far, I like the idea of the task key being passed by the user. If user doesn't provide one, we generate a random id(possibly by using an uuid) |
Assigned to you - I like the idea too. |
Another idea I have @rawwar is to check if we can compute a hash of the task ID built using the current combination of DAG ID task ID instead of a completely random UUID. If we generate a random UUID, we would also need to store that against each task so that it can be monitored accordingly |
Yeah, I think that's better. I will just |
related to apache#41816 Adds a warning log to indicate failure if length of task_key>100.
Agreed |
related to apache#41816 Adds a warning log to indicate failure if length of task_key>100.
Apache Airflow version
main (development)
If "Other Airflow 2 version" selected, which one?
No response
What happened?
According to the Databricks API documentation, task_key has a max length of 100: Link .
When the Dag ID and task ID strings are long enough, we create a task_key with more than 100 characters. However, this limit does not affect during job creation. Job gets created with the full name. But, when fetching using the job run details using getrun endpoint, it truncates the task_key. This is causing issue in the following line of code to cause key error: Link
What you think should happen instead?
task key should be unique. Hence, we can include an uuid, instead of using dag_id+task_id
How to reproduce
have a dag_id and task_id names to be longer than 100 characters together and use DatabricksNotebookOperator
Operating System
Debian GNU/Linux 12 (bookworm)
Versions of Apache Airflow Providers
apache-airflow-providers-databricks==6.8.0
Deployment
Astronomer
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: