What happens when OutOfMemory Error happens on spark container

Question

My spark application, which runs on yarn cluster, crashed and I am trying to determine the root cause. In the logs that I got from yarn using yarn logs -applicationId <application_id> I see a whole bunch of connection refused during block fetches and one out of memory error. Hard to tell what is the underlying cause. My question is what happens when a container is killed because of OutOfMemory exception. So in the container logs I see this is how the executor is started on the container

exec /bin/bash -c "LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" $JAVA_HOME/bin/java -server -Xmx5120m '-DENVIRONMENT=pt' -Djava.io.tmpdir=$PWD/tmp '-Dspark.history.ui.port=18081' '-Dspark.driver.port=39112' -Dspark.yarn.app.container.log.dir=/hadoop/hdfs/drive5/hadoop/yarn/log/application_1539650094881_0116/container_e111_1539650094881_0116_01_000024 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://[email protected]:39112 --executor-id 13 --hostname slave3.hadoop.tsl.com --cores 5 --app-id application_1539650094881_0116 --user-class-path file:$PWD/app.jar 1> /hadoop/hdfs/drive5/hadoop/yarn/log/application_1539650094881_0116/container_e111_1539650094881_0116_01_000024/stdout 2> /hadoop/hdfs/drive5/hadoop/yarn/log/application_1539650094881_0116/container_e111_1539650094881_0116_01_000024/stderr"

So this container in question gets OutOfMemory at a later point. Would spark in this case try to get a new container or will this end up crashing the application?

I also see a lot of Container killed by YARN for exceeding memory limits. 6.0 GB of 6 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. There seems to be multiple of these till the time the application crashed. Does spark retry a threshold number of times to restart the container?

Meenakshi Variyani Meenakshi Variyani · Accepted Answer · 2018-10-23T18:49:58

This generally happens when memory distribution over driver and executors is not proper.. you can try allocating driver's memory explicitly using spark.driver.memory in spark conf..

refer https://spark.apache.org/docs/latest/configuration.html#application-properties for additional settings

What happens when OutOfMemory Error happens on spark container

1 Answers