Spark Physical memory issue in AWS EMR

Question

I am trying to execute spark job with default settings in AWSEMR Which means the default container memory is 1.4GB. For some tables it is working fine, When we are dealing with large volume tables we are getting below error.

diagnostics: Application application_1527024725057_17128 failed 2 times due to AM Container for appattempt_1527024725057_17128_000002 exited with exitCode: -104 For more detailed output, check application tracking page:http://ip-10-179-106-153.us-west-2.compute.internal:8088/cluster/app/application_1527024725057_17128Then, click on links to logs of each attempt. Diagnostics: Container [pid=12379,containerID=container_1527024725057_17128_02_000001] is running beyond physical memory limits. Current usage: 1.4 GB of 1.4 GB physical memory used; 3.7 GB of 6.9 GB virtual memory used. Killing container.

When the data is beyond 1.4GB YARN Resource manger is killing that job and it is executed with failed status. I Need some help for above issue.

Changing any property value in yarn-site.xml.(‘such as memory overhead and container memory’) also didn't work. What are the ideal configurations when you deal with large volume of data other than maximizing the cluster size.

Why are you trying to run a big data application designed for a much larger machine in 1.4 GB of ram? 2 GB of ram today is small. In the big data world it is tiny. My iPhone has 3 GB of ram. If you have 1.4 of physical memory and you are using 3.7 GB of virtual memory, you have severely overloaded your instance. You need to step up to at least 5.1 GB of memory (which means an 8GB instance). — John Hanley
We are using 4 node dev cluster with 80 GB in total. We would definitely have a bigger one going forward in QA/Prod. As we are dealing with only sample data(not more than 30 GB in total) no need for having a bigger cluster. Not sure how to utilize the maximum available memory. Looking for the right configuration. — Ken Adams

Alex M981 Alex M981 · Accepted Answer · 2019-08-22T19:39:56

tuning executor and driver memory helped me

spark-submit --deploy-mode cluster --executor-memory 4g --driver-memory 4g s3://mybucket/myscript.py

Spark Physical memory issue in AWS EMR

1 Answers