0
votes

I am running pyspark job in AWS EMR cluster, the cluster details are as follows. one master instance (m5.2xlarge) five slave instances (m5.2xlarge-8 vCore, 32 GiB memory, EBS only storage EBS storage:200 GiB).

after I have submitted a pyspark job, it is failing with below error.

ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 24.1 GB of 24 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.

below is the spark submit command.

spark-submit  --deploy-mode cluster --master yarn --num-executors 2 --executor-cores 5 --executor-memory 21g --driver-memory 10g --conf spark.yarn.executor.memoryOverhead=3g --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.maxAppAttempts=100 --conf spark.executor.extraJavaOptions=-Xss3m  --conf spark.driver.maxResultSize=3g --conf spark.dynamicAllocation.enabled=false

please provide a better parameter for no of executors, executor memory and no cores.

2

2 Answers

1
votes

i can not increase the --executor-memory or spark.yarn.executor.memoryOverhead as it will reach max threshold (24576 MB).

The issue has been resolved after increasing the --num-executors to 5.

0
votes

One of your executor JVM is running out memory. As the error says Consider boosting spark.yarn.executor.memoryOverhead from 3g to a reasonable value.

You can also increase --executor-memory to a bigger value needed by your Application.

See spark properties here : https://spark.apache.org/docs/2.4.0/running-on-yarn.html