I deployed a simple spark application on kubernetes with following configurations:
spark.executor.instances=2;spark.executor.memory=8g; spark.dynamicAllocation.enabled=true;spark.dynamicAllocation.shuffleTracking.enabled=true; spark.executor.cores=2;spark.dynamicAllocation.minExecutors=2;spark.dynamicAllocation.maxExecutors=2;
Memory requirements of Executor PODS are more than what is available on Kubernetes cluster and due to this Spark Executor PODS always stay in PENDING state as below.
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/spark-k8sdemo-6e66d576f655b1f5-exec-1 0/1 Pending 0 10m
pod/spark-k8sdemo-6e66d576f655b1f5-exec-2 0/1 Pending 0 10m
pod/spark-master-6d9bc767c6-qsk8c 1/1 Running 0 10m
I know the reason is non-availability of resources as show in Kubectl describe command:
$ kubectl describe pod/spark-k8sdemo-6e66d576f655b1f5-exec-1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28s (x12 over 12m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.
On the other hand, driver pods keeps waiting forever for Executor PODS to get ample resources as below.
$ kubectl logs pod/spark-master-6d9bc767c6-qsk8c
21/01/12 11:36:46 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
21/01/12 11:37:01 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
21/01/12 11:37:16 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Now my question here is if there is some way to make driver wait only for sometime/retries and if Executors still don't get resource, driver POD should auto-die with printing proper message/logs e.g. "application aborted as there were no resources in cluster".
I went through all Spark configurations for above requirement but couldn't fine any. Though in YARN we have spark.yarn.maxAppAttempts but nothing similar was found for Kubernetes.
If no such configuration is available in Spark Is there a way in kubernetes POD definition to achieve the same.