3
votes

I'm running a Dataproc job using h1-highmem-16 machines which have 104 GB of memory each.

I double-checked the size of the instances in the Google Console and all workers and the master are indeed h1-highmem-16.

Nevertheless, I get this error:

Container killed by YARN for exceeding memory limits. 56.8 GB of 54 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

Why is YARN not using all 104 GB of memory?

2

2 Answers

7
votes

Dataproc configures memory settings to fit 2 Spark executors per machine, so each container should be expected to be half the capacity of each NodeManager.

You can optionally override spark.executor.memory and maybe spark.yarn.executor.memoryOverhead along with spark.executor.cores to change how you want to pack executors onto each machine. spark.executor.cores will default to half the machine's cores, since half the machine's memory is given to each executor. In your case this means each Spark executor tries to run 8 tasks in parallel in the same process.

You can effectively increase memory-per-task by reducing executor cores but keeping everything else the same, for example spark.executor.cores=6 will increase per-task memory by 33% even if you leave everything else the same. These can be specified at job-submission time:

gcloud dataproc jobs submit spark --properties spark.executor.cores=6
0
votes

I was getting "Container killed by YARN for exceeding memory limits" exception even when there was physical memory available on high memory instances.

I had to add "yarn.nodemanager.vmem-check-enabled=false" when creating the cluster

 gcloud beta dataproc clusters create highmem-cluster --properties="yarn:yarn.nodemanager.vmem-check-enabled=false" ...

Followed by which I was able to run the job with the following executor sizes on n1-highmem-16

spark.executor.cores=15,spark.executor.memory=70g