I'm trying to submit a spark application on a cluster with the following specs on GCP Dataproc:
- 3 worker nodes each with 15GB RAM and 4 cores
- 1 master node 7.5GB RAM and 2 cores
Following the guides i found on memory and executor tuning on YARN i derived the following values for the application parameters:
spark = SparkSession.builder \
.appName("test") \
.master("yarn")\
.config('spark.submit.deployMode','client')\
.config("spark.executor.instances", "3")\
.config("spark.executor.memory","10g")\
.config("spark.executor.cores","3")\
.enableHiveSupport() \
.getOrCreate()
as far as spark.executor.memory
is concerned i should be well into the limits since i've reserved 1gb RAM for OS and Hadoop Daemons, therefore consindering memory overhead my limit should be
max(384MB, .07 * spark.executor.memory)---> max(384MB, .07*14GB)=max(384mb,0,98GB)= approx 1GB
so 15-2GB=13GB and i specified 10GB just to be safe.
Available cores are 4-1=3 since as i just said 1 core is reserved.
I would expect to see in the application UI 3 executors but i only get 2, i also tried by specifing spark.executor.instances=2
instead of 2 witho no avail.
Am i missing something?
thanks
spark.dynamicAllocation.enabled=false
? Also note that YARN NodeManager doesn't get all the 15GB of the worker memory. – Dagang