Spark application gets KILLED abruptly in EMR after 1 hour and livy session expires.What is the cause& solution?

Question

I am using JupyterHub on AWS EMR cluster. I am using EMR version 5.16

I submitted a spark application using a pyspark3 notebook. My application is trying to write 1TB data to s3. I am using autoscaling feature of the EMR to scale us the task node.

Hardware configurations: 1.Master node:32 GB RAM with 16 cores 2.Core node:32 GB RAM with 16 cores 3.Task node:16 GB with 8 cores each. (Task nodes scales up 15)

I have observed that Spark application gets killed after running for 50 to 60 minutes. I tried debugging: 1. My cluster still had scope for scaling up. So it is not an issue with a shortage of resources. 2. Livy session also gets killed. 3. In the job log, I saw error message RECVD TERM SIGNAL "Shutdown hook received"

Please note: 1. I have kept :spark.dynamicAllocation.enabled=true" 2. I am using the yarn fair scheduler with user impersonation in Jupiter hub

Can you please help me in understanding the problem and solution for it?

Tonca Tonca · Accepted Answer · 2019-05-16T13:48:32

I think that I faced the same problem and I found the solution thanks to this answer.

The issue comes from the Livy configuration parameter livy.server.session.timeout, which sets the timeout for a session by default to 1 hour.

You should set it by adding the following line into the configurations of the EMR cluster.

[{'classification': 'livy-conf','Properties': {'livy.server.session.timeout':'5h'}}]

This solved the issue for me.

Spark application gets KILLED abruptly in EMR after 1 hour and livy session expires.What is the cause& solution?

1 Answers