17
votes

We have long running EMR cluster where we submit Spark jobs. I see that over time the HDFS fills up with the Spark application logs which sometimes renders a host unhealthy as viewed by EMR/Yarn (?).

Running hadoop fs -R -h / shows [1] which clearly shows no application logs have ever been deleted.

We have set the spark.history.fs.cleaner.enabled to true (validated this in the Spark UI) and were hoping the other defaults like cleaner interval (1 day) and cleaner max age (7d) as mentioned at: http://spark.apache.org/docs/latest/monitoring.html#spark-configuration-options would take care of cleaning up these logs.​ But that is not the case.

Any ideas?

[1]

-rwxrwx---   2 hadoop spark      543.1 M 2017-01-11 13:13 /var/log/spark/apps/application_1484079613665_0001
-rwxrwx---   2 hadoop spark        7.8 G 2017-01-17 10:51 /var/log/spark/apps/application_1484079613665_0002.inprogress
-rwxrwx---   2 hadoop spark        1.4 G 2017-01-18 08:11 /var/log/spark/apps/application_1484079613665_0003
-rwxrwx---   2 hadoop spark        2.9 G 2017-01-20 07:41 /var/log/spark/apps/application_1484079613665_0004
-rwxrwx---   2 hadoop spark      125.9 M 2017-01-20 09:57 /var/log/spark/apps/application_1484079613665_0005
-rwxrwx---   2 hadoop spark        4.4 G 2017-01-23 10:19 /var/log/spark/apps/application_1484079613665_0006
-rwxrwx---   2 hadoop spark        6.6 M 2017-01-23 10:31 /var/log/spark/apps/application_1484079613665_0007
-rwxrwx---   2 hadoop spark       26.4 M 2017-01-23 11:09 /var/log/spark/apps/application_1484079613665_0008
-rwxrwx---   2 hadoop spark       37.4 M 2017-01-23 11:53 /var/log/spark/apps/application_1484079613665_0009
-rwxrwx---   2 hadoop spark      111.9 M 2017-01-23 13:57 /var/log/spark/apps/application_1484079613665_0010
-rwxrwx---   2 hadoop spark        1.3 G 2017-01-24 10:26 /var/log/spark/apps/application_1484079613665_0011
-rwxrwx---   2 hadoop spark        7.0 M 2017-01-24 10:37 /var/log/spark/apps/application_1484079613665_0012
-rwxrwx---   2 hadoop spark       50.7 M 2017-01-24 11:40 /var/log/spark/apps/application_1484079613665_0013
-rwxrwx---   2 hadoop spark       96.2 M 2017-01-24 13:27 /var/log/spark/apps/application_1484079613665_0014
-rwxrwx---   2 hadoop spark      293.7 M 2017-01-24 17:58 /var/log/spark/apps/application_1484079613665_0015
-rwxrwx---   2 hadoop spark        7.6 G 2017-01-30 07:01 /var/log/spark/apps/application_1484079613665_0016
-rwxrwx---   2 hadoop spark        1.3 G 2017-01-31 02:59 /var/log/spark/apps/application_1484079613665_0017
-rwxrwx---   2 hadoop spark        2.1 G 2017-02-01 12:04 /var/log/spark/apps/application_1484079613665_0018
-rwxrwx---   2 hadoop spark        2.8 G 2017-02-03 08:32 /var/log/spark/apps/application_1484079613665_0019
-rwxrwx---   2 hadoop spark        5.4 G 2017-02-07 02:03 /var/log/spark/apps/application_1484079613665_0020
-rwxrwx---   2 hadoop spark        9.3 G 2017-02-13 03:58 /var/log/spark/apps/application_1484079613665_0021
-rwxrwx---   2 hadoop spark        2.0 G 2017-02-14 11:13 /var/log/spark/apps/application_1484079613665_0022
-rwxrwx---   2 hadoop spark        1.1 G 2017-02-15 03:49 /var/log/spark/apps/application_1484079613665_0023
-rwxrwx---   2 hadoop spark        8.8 G 2017-02-21 05:42 /var/log/spark/apps/application_1484079613665_0024
-rwxrwx---   2 hadoop spark      371.2 M 2017-02-21 11:54 /var/log/spark/apps/application_1484079613665_0025
-rwxrwx---   2 hadoop spark        1.4 G 2017-02-22 09:17 /var/log/spark/apps/application_1484079613665_0026
-rwxrwx---   2 hadoop spark        3.2 G 2017-02-24 12:36 /var/log/spark/apps/application_1484079613665_0027
-rwxrwx---   2 hadoop spark        9.5 M 2017-02-24 12:48 /var/log/spark/apps/application_1484079613665_0028
-rwxrwx---   2 hadoop spark       20.5 G 2017-03-10 04:00 /var/log/spark/apps/application_1484079613665_0029
-rwxrwx---   2 hadoop spark        7.3 G 2017-03-10 04:04 /var/log/spark/apps/application_1484079613665_0030.inprogress
1
What EMR AMI version are you using ? Are those container/executor logs ? Are you using YARN mode ? - jc mannem
@swaranga-sarma Have you been able to fix this problem? We have been running into something similar where our 1 long running application never has its logs cleaned. - Interfector
@Interfector I think ferris-tseng is correct. Going to try it out. Hitting similar issues - Gaurav Shah
@GauravShah We've tried this solution, and it did not seem to do the trick. The reason for that is our application is long running. For cleanup to take place, the application needs to finish, it will not rotate logs for running applications. We had to completely disable Spark History Server. - Interfector
@Interfector I guess we are going to hit the same issue. will look if I can find something else - Gaurav Shah

1 Answers

26
votes

I was running into this issue on emr-5.4.0, and set spark.history.fs.cleaner.interval to 1h, and was able to get the cleaner to run.

For reference, here is the end of my spark-defaults.conf file:

spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.maxAge  12h
spark.history.fs.cleaner.interval 1h

After you make the change, restart your spark history server.

Another clarification: Setting these values during application run, i.e spark-submit via --conf has no effect. Either set them at cluster creation time via the EMR configuration API or manually edit the spark-defaults.conf, set these values and restart the spark history server. Also note that the logs will be cleaned up the next time your Spark app restarts. For instance, if you have a long running Spark streaming job, it will not delete any logs for that application run and will keep accumulating logs. And when the next time the job restarts (may be because of a deployment) it will cleanup the older logs.