2
votes

I'm using Apache Airflow and recognized that the size of the gunicorn-error.log grown over 50 GB within 5 months. Most of the log messages are INFO level logs like:

[2018-05-14 17:31:39 +0000] [29595] [INFO] Handling signal: ttou
[2018-05-14 17:32:37 +0000] [2359] [INFO] Worker exiting (pid: 2359)
[2018-05-14 17:33:07 +0000] [29595] [INFO] Handling signal: ttin
[2018-05-14 17:33:07 +0000] [5758] [INFO] Booting worker with pid:
5758 [2018-05-14 17:33:10 +0000] [29595] [INFO] Handling signal: ttou [2018-05-14 17:33:41 +0000] [2994] [INFO] Worker exiting (pid: 2994)
[2018-05-14 17:34:11 +0000] [29595] [INFO] Handling signal: ttin
[2018-05-14 17:34:11 +0000] [6400] [INFO] Booting worker with pid: 6400 [2018-05-14 17:34:13 +0000] [29595] [INFO] Handling signal: ttou
[2018-05-14 17:34:36 +0000] [3611] [INFO] Worker exiting (pid: 3611)

Within the Airflow config file I'm only able to set the log file path. Does anyone know how to change the gunicorn logging to another level within Airflow? I do not need this fine grained logging level because it overfills my hard drive.

2
I have it at "/var/log/airflow" and the log location can be set within the airflow.cfg. I have not modified my airflow setup and I'm using v1.8.0. I've now set "LOGGING_LEVEL = logging.WARNING" at "/.local/lib/python3.5/site-packages/airflow/settings.py". Now there are no more INFO logs but it does not seem to be the best solution...Stev
Well, we also have the log path set but there is no gunicorn.log. This might be connected to v1.8.0? Also it is possible to set the log level in the airflow.cfg - at least in 1.9.0tobi6

2 Answers

3
votes

I managed to solve the problem by setting an environment variable:

GUNICORN_CMD_ARGS="--log-level WARNING"

If setting this in a docker-compose.yml file, the following is tested with apache-airflow==1.10.6 with gunicorn==19.9.0:

environment:
    - 'GUNICORN_CMD_ARGS=--log-level WARNING'

If setting this in a Dockerfile, the following is tested with apache-airflow==1.10.6 with gunicorn==19.9.0:

ENV GUNICORN_CMD_ARGS --log-level WARNING
0
votes

Logging seems a bit tricky to me in Airflow. One of the reason is that logging is split into several parts. For instance, the logging configuration for Airflow is totally different from the one of gunicorn webserver (the "spam" logs you mention in your messages come from gunicorn).

To solve this Spam problem, I modified a bit the bin/cli.py of Airflow by adding some few lines in the webserver() function:

   if args.log_config:
        run_args += ['--log-config', str(args.log_config)]

(for the sake of brevity I haven't pasted the code to handle the argument)

And then, as for a log config file I have something similar to:

[loggers]
keys=root, gunicorn.error, gunicorn.access

[handlers]
keys=console, error_file, access_file

[formatters]
keys=generic, access

[logger_root]
level=INFO
handlers=console

[logger_gunicorn.error]
level=INFO
handlers=error_file
propagate=0
qualname=gunicorn.error

[logger_gunicorn.access]
level=INFO
handlers=access_file
propagate=1
qualname=gunicorn.access

[handler_console]
class=StreamHandler
formatter=generic
args=(sys.stdout, )

[handler_error_file]
class=logging.handlers.TimedRotatingFileHandler
formatter=generic
args=('/home/airflow/airflow/logs/webserver/gunicorn.error.log',)

[handler_access_file]
class=logging.handlers.TimedRotatingFileHandler
formatter=access
args=('/home/airflow/airflow/logs/webserver/gunicorn.access.log',)

[formatter_generic]
format=[%(name)s] [%(module)s] [%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s
#format=[%(levelname)s] %(asctime)s [%(process)d] [%(levelname)s] %(message)s
datefmt=%Y-%m-%d %H:%M:%S
class=logging.Formatter

[formatter_access]
format=%(message)s
class=logging.Formatter

Note the "propagate=0" in gunicorn.error, which avoids the spams in your stdout. You still have them but at least it is localized in /home/airflow/airflow/logs/webserver/gunicorn.error.log , which should be rotated (I haven't fully tested yet the rotation part to be honest).

If I have time, I'll submit this change as a Jira ticket for Airflow.