I am trying to execute spark job with default settings in AWSEMR Which means the default container memory is 1.4GB. For some tables it is working fine, When we are dealing with large volume tables we are getting below error.
diagnostics: Application application_1527024725057_17128 failed 2 times due to AM Container for appattempt_1527024725057_17128_000002 exited with exitCode: -104 For more detailed output, check application tracking page:http://ip-10-179-106-153.us-west-2.compute.internal:8088/cluster/app/application_1527024725057_17128Then, click on links to logs of each attempt. Diagnostics: Container [pid=12379,containerID=container_1527024725057_17128_02_000001] is running beyond physical memory limits. Current usage: 1.4 GB of 1.4 GB physical memory used; 3.7 GB of 6.9 GB virtual memory used. Killing container.
When the data is beyond 1.4GB YARN Resource manger is killing that job and it is executed with failed status. I Need some help for above issue.
Changing any property value in yarn-site.xml.(‘such as memory overhead and container memory’) also didn't work. What are the ideal configurations when you deal with large volume of data other than maximizing the cluster size.