We have long running EMR cluster where we submit Spark jobs. I see that over time the HDFS fills up with the Spark application logs which sometimes renders a host unhealthy as viewed by EMR/Yarn (?).
Running hadoop fs -R -h / shows [1] which clearly shows no application logs have ever been deleted.
We have set the spark.history.fs.cleaner.enabled to true (validated this in the Spark UI) and were hoping the other defaults like cleaner interval (1 day) and cleaner max age (7d) as mentioned at: http://spark.apache.org/docs/latest/monitoring.html#spark-configuration-options would take care of cleaning up these logs. But that is not the case.
Any ideas?
[1]
-rwxrwx--- 2 hadoop spark 543.1 M 2017-01-11 13:13 /var/log/spark/apps/application_1484079613665_0001
-rwxrwx--- 2 hadoop spark 7.8 G 2017-01-17 10:51 /var/log/spark/apps/application_1484079613665_0002.inprogress
-rwxrwx--- 2 hadoop spark 1.4 G 2017-01-18 08:11 /var/log/spark/apps/application_1484079613665_0003
-rwxrwx--- 2 hadoop spark 2.9 G 2017-01-20 07:41 /var/log/spark/apps/application_1484079613665_0004
-rwxrwx--- 2 hadoop spark 125.9 M 2017-01-20 09:57 /var/log/spark/apps/application_1484079613665_0005
-rwxrwx--- 2 hadoop spark 4.4 G 2017-01-23 10:19 /var/log/spark/apps/application_1484079613665_0006
-rwxrwx--- 2 hadoop spark 6.6 M 2017-01-23 10:31 /var/log/spark/apps/application_1484079613665_0007
-rwxrwx--- 2 hadoop spark 26.4 M 2017-01-23 11:09 /var/log/spark/apps/application_1484079613665_0008
-rwxrwx--- 2 hadoop spark 37.4 M 2017-01-23 11:53 /var/log/spark/apps/application_1484079613665_0009
-rwxrwx--- 2 hadoop spark 111.9 M 2017-01-23 13:57 /var/log/spark/apps/application_1484079613665_0010
-rwxrwx--- 2 hadoop spark 1.3 G 2017-01-24 10:26 /var/log/spark/apps/application_1484079613665_0011
-rwxrwx--- 2 hadoop spark 7.0 M 2017-01-24 10:37 /var/log/spark/apps/application_1484079613665_0012
-rwxrwx--- 2 hadoop spark 50.7 M 2017-01-24 11:40 /var/log/spark/apps/application_1484079613665_0013
-rwxrwx--- 2 hadoop spark 96.2 M 2017-01-24 13:27 /var/log/spark/apps/application_1484079613665_0014
-rwxrwx--- 2 hadoop spark 293.7 M 2017-01-24 17:58 /var/log/spark/apps/application_1484079613665_0015
-rwxrwx--- 2 hadoop spark 7.6 G 2017-01-30 07:01 /var/log/spark/apps/application_1484079613665_0016
-rwxrwx--- 2 hadoop spark 1.3 G 2017-01-31 02:59 /var/log/spark/apps/application_1484079613665_0017
-rwxrwx--- 2 hadoop spark 2.1 G 2017-02-01 12:04 /var/log/spark/apps/application_1484079613665_0018
-rwxrwx--- 2 hadoop spark 2.8 G 2017-02-03 08:32 /var/log/spark/apps/application_1484079613665_0019
-rwxrwx--- 2 hadoop spark 5.4 G 2017-02-07 02:03 /var/log/spark/apps/application_1484079613665_0020
-rwxrwx--- 2 hadoop spark 9.3 G 2017-02-13 03:58 /var/log/spark/apps/application_1484079613665_0021
-rwxrwx--- 2 hadoop spark 2.0 G 2017-02-14 11:13 /var/log/spark/apps/application_1484079613665_0022
-rwxrwx--- 2 hadoop spark 1.1 G 2017-02-15 03:49 /var/log/spark/apps/application_1484079613665_0023
-rwxrwx--- 2 hadoop spark 8.8 G 2017-02-21 05:42 /var/log/spark/apps/application_1484079613665_0024
-rwxrwx--- 2 hadoop spark 371.2 M 2017-02-21 11:54 /var/log/spark/apps/application_1484079613665_0025
-rwxrwx--- 2 hadoop spark 1.4 G 2017-02-22 09:17 /var/log/spark/apps/application_1484079613665_0026
-rwxrwx--- 2 hadoop spark 3.2 G 2017-02-24 12:36 /var/log/spark/apps/application_1484079613665_0027
-rwxrwx--- 2 hadoop spark 9.5 M 2017-02-24 12:48 /var/log/spark/apps/application_1484079613665_0028
-rwxrwx--- 2 hadoop spark 20.5 G 2017-03-10 04:00 /var/log/spark/apps/application_1484079613665_0029
-rwxrwx--- 2 hadoop spark 7.3 G 2017-03-10 04:04 /var/log/spark/apps/application_1484079613665_0030.inprogress