0
votes

I'm running Flink jobs in EMR. In my EMR cluster, I could see my job logs in s3 Log URI:s3://aws-logs-xxxxx/elasticmapreduce/, and also there are some logs under /usr/lib/flink/log/ in my master node. Since we only config 20G for root space, so it is easy to hit the limitation due to these log files(flink-flink-historyserver-xxxxx.log) under /usr/lib/flink/log/.

My questions are:

  1. Where defines writing log files into /usr/lib/flink/log/?
  2. As long as we already have logged in s3, do we still need logs under /usr/lib/flink/log/ ?
  3. Is there a way to disable it or do something like Spark fs.cleaner:
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.maxAge  12h
spark.history.fs.cleaner.interval 1h

Here's my HistoryServer configuration in flink-conf.yaml

# Directory to upload completed jobs to. Add this directory to the list of
# monitored directories of the HistoryServer as well (see below).
jobmanager.archive.fs.dir: hdfs:///completed-jobs/

# The address under which the web-based HistoryServer listens.
historyserver.web.address: 0.0.0.0

# The port under which the web-based HistoryServer listens.
historyserver.web.port: 8082

# Comma separated list of directories to monitor for completed jobs.
historyserver.archive.fs.dir: hdfs:///completed-jobs/

# Interval in milliseconds for refreshing the monitored directories.
historyserver.archive.fs.refresh-interval: 10000
1

1 Answers

0
votes

By modifying env.log.dir and env.log.max configuration in flink-conf.yaml to control the file path and the number of files. For more log configurations, you can modify the Log4j properties file in the conf folder

You can refer to the following configuration https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/ https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/advanced/logging/