I am creating a cluster in google dataproc with the following characteristics:
Master Standard (1 master, N workers)
Machine n1-highmem-2 (2 vCPU, 13.0 GB memory)
Primary disk 250 GB
Worker nodes 2
Machine type n1-highmem-2 (2 vCPU, 13.0 GB memory)
Primary disk size 250 GB
I am also adding in Initialization actions
the .sh
file from this repository in order to use zeppelin.
The code that I use works fine with some data but if I use bigger amount of, I got the following error:
Container killed by YARN for exceeding memory limits. 4.0 GB of 4 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
I have seen posts such as this one: Container killed by YARN for exceeding memory... where it is recommended to change yarn.nodemanager.vmem-check-enabled
to false
.
I am a bit confused though. Are all these configurations happening when I initialize the cluster or not?
Also where exactly is yarn-site.xml
located? I am unable to find it in the master(cant find it in /usr/lib/zeppelin/conf/
, /usr/lib/spark/conf
, /usr/lib/hadoop-yar/
) in order to change it, and if changed what do i need to 'restart'?
dataproc clusters create <my-cluster> --properties yarn:yarn.nodemanager.vmem-check-enabled=false
– Igor Dvorzhakyarn-site.xml
here:/etc/hadoop/conf.empty/yarn-site.xml
After updating it you can restart YARN with command:sudo systemctl restart hadoop-yarn-resourcemanager.service
– Igor Dvorzhak