Spark on Google Cloud Kubernetes Cluster - keeps Evicting Executors: workers are registered and have sufficient resources

Question

I have followed the below instructions, except minicube, I have used Google Cloud Platform Kubernetes cluster: (Spark 2.3.2)

https://testdriven.io/blog/deploying-spark-on-kubernetes/

When I submit spark jobs with:

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://spark-master:7077 \
  --executor-memory 471859200 \
  --total-executor-cores 20 \
  --deploy-mode cluster \
  /opt/spark/examples/jars/spark-examples_2.11-2.3.2.jar \
  10

or simply open Spark shell using:

/opt/spark/bin/spark-shell --master spark://spark-master:7077
sc.makeRDD(List(1,2,4,4)).count

I keep on getting below WARN messages:

2020-04-18 21:14:38 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2020-04-18 21:14:53 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

On the Spark UI, I can see all my Worker nodes, that I can easily control via:

kubectl scale deployment spark-worker --replicas 2 (or any other number, works fine)

I see a new running application on Spark UI, which keeps on existing executors. I saw it go up to 309 executors, then I kill the job from Spark UI.

Local mode Runs successfully:

/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[2] /opt/spark/examples/jars/spark-examples_2.11-2.3.2.jar 10

I fire all my spark submits from the Master Kubernetes pod:

kubectl exec -it spark-master-dc7d76bf5-dthvn bash

What am I doing wrong? Please let me know what other system details you want from me. Thanks.

Edit: adding Spark UI screenshot of Executors:

Worker log:https://drive.google.com/file/d/1xU07m_OB1BEzJXyJ30WzvA5vcrpVmxYj/view?usp=sharing

Master log: spark on K8 master log

@AliaksandrSasnouskikh Hi, uploaded the 2 log files. thanks. — sumon c
Hm, looks a bit weird. Will try to reproduce and let you know how it goes. Hope to get to it till the end of this week. — Aliaksandr Sasnouskikh
No update from my side. Luckily for me it was a POC, best wishes if anyone is stuck with this in real life project. — sumon c

Rama Krishnaa Palanisamy Rama Krishnaa Palanisamy · Accepted Answer · 2021-03-25T11:27:21

Hi the problem could be because of number of executor instances set on the spark submit. When we give 1 as number of executor's then driver will run on that executor and for all other tasks there will not be any executor to run. We need to give Num executors are atleast 2 so that driver will run on one executor and workers will get the other.

I was using as below initially and after changing number of executors it worked.

bin/spark-submit --name test --master k8s://https://*****:6443 --deploy-mode cluster --class com.classname --conf spark.kubernetes.driver.pod.name=test --conf spark.kubernetes.authenticate.submission.caCertFile=/etc/kubernetes/pki/ca.crt --conf spark.kubernetes.authenticate.driver.caCertFile=/etc/kubernetes/pki/ca.crt --conf spark.kubernetes.authenticate.executor.caCertFile=/etc/kubernetes/pki/ca.crt --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.namespace=default --conf spark.executor.instances=3 --conf spark.kubernetes.container.image=spark:latest --conf spark.dynamicAllocation.enabled=false --conf spark.shuffle.service.enabled=false --conf spark.kubernetes.driver.limit.cores=1 --conf spark.kubernetes.executor.limit.cores=1 local:///opt/spark/examples/jars/examplejar.jar

Spark on Google Cloud Kubernetes Cluster - keeps Evicting Executors: workers are registered and have sufficient resources

1 Answers