I'm running Flink jobs in EMR. In my EMR cluster, I could see my job logs in s3 Log URI:s3://aws-logs-xxxxx/elasticmapreduce/
, and also there are some logs under /usr/lib/flink/log/
in my master node. Since we only config 20G for root space, so it is easy to hit the limitation due to these log files(flink-flink-historyserver-xxxxx.log
) under /usr/lib/flink/log/
.
My questions are:
- Where defines writing log files into
/usr/lib/flink/log/
? - As long as we already have logged in s3, do we still need logs under
/usr/lib/flink/log/
? - Is there a way to disable it or do something like Spark fs.cleaner:
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.maxAge 12h
spark.history.fs.cleaner.interval 1h
Here's my HistoryServer configuration in flink-conf.yaml
# Directory to upload completed jobs to. Add this directory to the list of
# monitored directories of the HistoryServer as well (see below).
jobmanager.archive.fs.dir: hdfs:///completed-jobs/
# The address under which the web-based HistoryServer listens.
historyserver.web.address: 0.0.0.0
# The port under which the web-based HistoryServer listens.
historyserver.web.port: 8082
# Comma separated list of directories to monitor for completed jobs.
historyserver.archive.fs.dir: hdfs:///completed-jobs/
# Interval in milliseconds for refreshing the monitored directories.
historyserver.archive.fs.refresh-interval: 10000