0
votes

I have a cluster on EMR (emr-5.20.0) with a m5.2xlarge as Node Master, two m4.large as core and three m4.large as node workers. The sum of memory ram of this cluster is 62GB, but in the YARN UI the total memory displayed is 30GB.

Somebody can help me understand how this value is calculed ?

I have already check the configuration in Yarn-site.xml and spark-default.conf and them is configured according to the AWS recommendadion: https://docs.aws.amazon.com/pt_br/emr/latest/ReleaseGuide/emr-hadoop-task-config.html#emr-hadoop-task-config-m5

Every help is welcome

2
As @michal-lemay said you don't include the master node only the worker nodes so you're using the wrong recommended settings.tk421

2 Answers

1
votes

The memory settings in YARN can be configured using the below parameters of cluster:

yarn.nodemanager.resource.memory-mb
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.increment-allocation-mb
yarn.scheduler.maximum-allocation-mb

My tweaking these parameters you can increase/decrease the total memory allocated to the cluster.

1
votes

Yarn do not include the master node in it's available memory/cores.

So you should get roughly 5 x 8GB (m4.large). You will get less than that because there are memory overhead left for the OS and services.