0
votes

The spark job running in yarn mode, shows few tasks failed with following reason:

ExecutorLostFailure (executor 36 exited caused by one of the running tasks) Reason: Container marked as failed: container_xxxxxxxxxx_yyyy_01_000054 on host: ip-xxx-yy-zzz-zz. Exit status: -100. Diagnostics: Container released on a *lost* node

Any idea why is this happening?

3

3 Answers

5
votes

There are two main reasons.

  1. It is may because of your memoryOverhead needed by the yarn container is not enough, and the solution is to Increase the spark.executor.memoryOverhead
  2. Possibly, it is because the slave node disk lack space to write tmp data required by spark. check your yarn usercache dir (for EMR, it locates on /mnt/yarn/usercache/),
    or type df -h to check your disk remaining space.
0
votes

Container killed by the framework, either due to being released by the application or being 'lost' due to node failures etc. have a special exit code of -100. The node failure could be because of not having enough disc space or executor memory.

0
votes