1
votes

I'm trying to submit a spark application on a cluster with the following specs on GCP Dataproc:

  • 3 worker nodes each with 15GB RAM and 4 cores
  • 1 master node 7.5GB RAM and 2 cores

Following the guides i found on memory and executor tuning on YARN i derived the following values for the application parameters:

spark = SparkSession.builder \
    .appName("test") \
    .master("yarn")\
    .config('spark.submit.deployMode','client')\
    .config("spark.executor.instances", "3")\
    .config("spark.executor.memory","10g")\
    .config("spark.executor.cores","3")\
    .enableHiveSupport() \
    .getOrCreate()  

as far as spark.executor.memory is concerned i should be well into the limits since i've reserved 1gb RAM for OS and Hadoop Daemons, therefore consindering memory overhead my limit should be

max(384MB, .07 * spark.executor.memory)---> max(384MB, .07*14GB)=max(384mb,0,98GB)= approx 1GB

so 15-2GB=13GB and i specified 10GB just to be safe.
Available cores are 4-1=3 since as i just said 1 core is reserved.

I would expect to see in the application UI 3 executors but i only get 2, i also tried by specifing spark.executor.instances=2 instead of 2 witho no avail.

Am i missing something?

thanks

Can you try adding spark.dynamicAllocation.enabled=false? Also note that YARN NodeManager doesn't get all the 15GB of the worker memory.Dagang