SparkKubernetesOperator fails to fetch the driver pod when SparkApplication is still in pending state #40495
Labels
area:providers
good first issue
kind:bug
This is a clearly a bug
provider:apache-spark
provider:cncf-kubernetes
Kubernetes provider related issues
Apache Airflow Provider(s)
cncf-kubernetes
Versions of Apache Airflow Providers
apache-airflow-providers-cncf-kubernetes==8.3.1
Apache Airflow version
2.9.2
Operating System
Debian GNU/Linux 12 (bookworm)
Deployment
Other Docker-based deployment
Deployment details
Airflow is deployed on Kubernetes, also customized Spark Operator with custom API group is deployed on the same cluster.
What happened
I've a DAG with
SparkKubernetesOperator
task, which is failing. The issue is thatSparkKubernetesOperator
task is failed due it couldn't fine the driver pod, but SparkApplication is still in pending state at this moment, and pod is appeared in 1-2 minutes after theSparkKubernetesOperator
task is failed.What you think should happen instead
SparkKubernetesOperator
task should wait the running state of the SparkApplication, and DAG should complete successfully.How to reproduce
Prepare env with Airflow on Kubernetes and Spark Operator, deployed there.
Create a DAG, which consist of two files:
spark_pi.py
spark_pi.yaml
Launch this DAG via Airflow UI, check the status and log of the failed task.
Anything else
Logs of failed SparkKubernetesOperator task
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: