1
votes

We're running Airflow cluster using puckel/airflow docker image with docker-compose. Airflow's scheduler container outputs its logs to /usr/local/airflow/logs/scheduler.

The problem is that the log files are not rotated and disk usage increases until the disk gets full. Dag for cleaning up the log directory is available but the DAG run on worker node and log directory on scheduler container is not cleaned up.

I'm looking for the way to output scheduler log to stdout or S3/GCS bucket but unable to find out. Is there any to output the scheduler log to stdout or S3/GCS bucket?

3

3 Answers

1
votes

Finally I managed to output scheduler's log to stdout.

Here you can find how to use custom logger of Airflow. The default logging config is available at github.

What you have to do is.

(1) Create custom logger class to ${AIRFLOW_HOME}/config/log_config.py.


# Setting processor (scheduler, etc..) logs output to stdout
# Referring https://www.astronomer.io/guides/logging
# This file is created following https://airflow.apache.org/docs/apache-airflow/2.0.0/logging-monitoring/logging-tasks.html#advanced-configuration

from copy import deepcopy
from airflow.config_templates.airflow_local_settings import DEFAULT_LOGGING_CONFIG
import sys

LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
LOGGING_CONFIG["handlers"]["processor"] = {
    "class": "logging.StreamHandler",
    "formatter": "airflow",
    "stream": sys.stdout,
}

(2) Set logging_config_class property to config.log_config.LOGGING_CONFIG in airflow.cfg

logging_config_class = config.log_config.LOGGING_CONFIG

(3) [Optional] Add $AIRFLOW_HOME to PYTHONPATH environment.

export "${PYTHONPATH}:~"
  • Actually, you can set the path of logging_config_class to anything as long as the python is able to load the package.
  • Setting handler.processor to airflow.utils.log.logging_mixin.RedirectStdHandler didn't work for me. It used too much memory.
0
votes

remote_logging=True in airflow.cfg is the key. Please check the thread here for detailed steps.

0
votes

You can extend the image with the following or do so in airflow.cfg

ENV AIRFLOW__LOGGING__REMOTE_LOGGING=True
ENV AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=gcp_conn_id
ENV AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=gs://bucket_name/AIRFLOW_LOGS

the gcp_conn_id should have the correct permission to create/delete objects in GCS