-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BeamRunPythonPipelineOperator doesn't push xcom until Pipeline completes, leaving DataflowSensors worthless #30007
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Agreed. |
Can I take care of this issue? |
@hubert-pietron Sure thing, all yours! |
I need to unassigned myself, currently by the change of work I do not have time to look into the problem :/ |
job_id is stored in: It is not stored in: Code Reference:
If you modify your code to retrieve dataflow job_id correctly, you will be able to retrieve it. To illustrate how this is done, here is a sample code on how to retrieve Dataflow job id: |
Have you tested this? the documentation is inconsistent and not reliable to solely go off of. for example, documentation you referenced states:
and dataflow_job_id is not actually in the xcom |
You are right. I was asked to take a look at this issue, and didn't have a chance to read the issue description in detail. The dataflow job id is indeed only available after a Dataflow job finishes successfully. In a perfect world where no issue occurs, this is fine, but in the real world, when a Dataflow job gets cancelled, there is no job id to track the cancelled Dataflow job. |
I can take on this issue |
@zeotuan All yours! |
This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author. |
- To let GCP Beam Sensor operators 'sense' the pipeline changes, by having dataflow job_id been xcom_push as soon as it available. Related issue: apache#30007.
- To let GCP Beam Sensor operators 'sense' the pipeline changes, by having dataflow job_id been xcom_push as soon as it available. Related issue: apache#30007.
#42982) - To let GCP Beam Sensor operators 'sense' the pipeline changes, by having dataflow job_id been xcom_push as soon as it available. Related issue: #30007. Co-authored-by: Oleg Kachur <[email protected]>
apache#42982) - To let GCP Beam Sensor operators 'sense' the pipeline changes, by having dataflow job_id been xcom_push as soon as it available. Related issue: apache#30007. Co-authored-by: Oleg Kachur <[email protected]>
Fixed in #42982, the data flow job got pushed to the xcom, as soon as it available, and can be retrieved like P.S. Can you close the issue please? @CYarros10 @josh-fell |
apache#42982) - To let GCP Beam Sensor operators 'sense' the pipeline changes, by having dataflow job_id been xcom_push as soon as it available. Related issue: apache#30007. Co-authored-by: Oleg Kachur <[email protected]>
Apache Airflow version
2.5.1
What happened
BeamRunPythonPipelineOperator does not push values to xcoms when the pipeline starts. But Dataflow Sensors work like this:
Since the only way to retrieve Dataflow Job ID from a BeamRunPythonPipelineOperator is through xcom, and BeamRunPythonPipelineOperator does not push this xcom until the pipeline ends, the Sensor can't "sense". It will only be able to read jobs that are done.
Error Message:
jinja2.exceptions.UndefinedError: 'None' has no attribute 'dataflow_job_config'
BeamRunPythonPipelineOperator Xcom (after completing):
What you think should happen instead
The dataflow Job ID should be pushed to xcom when/before the pipeline starts.
How to reproduce
Sample Code
Operating System
composer-2.1.5-airflow-2.4.3
Versions of Apache Airflow Providers
2.4.3
Deployment
Google Cloud Composer
Deployment details
No response
Anything else
Occurs every time
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: