0
votes

I have recently upgraded airflow from 1.10.0 to 1.10.10. The current setup is web, worker, scheduler and flower are on same machine. When a DAG is run first step is it spins up new EMR for the DAG and along with it a worker node where only worker process runs. We are using celery executor. This worker node sends tasks to run on EMR cluster. Once the tasks are run next steps are terminating EMR and terminating this worker instance. Every task's log is present on this worker node. As long as the tasks are running or worker node is running, I can see the logs on web UI. But as soon as worker is terminated, I am unable to see the logs. The config is setup is to upload logs to s3. I see logs of startEMR and startWorker on S3 since these logs are main airflow instance(where all 4 processes are running) Here is the config snippet of airflow.cfg

base_log_folder = /home/deploy/airflow/logs
remote_logging = True
remote_base_log_folder = s3://airflow-log-bucket/airflow/logs/
remote_log_conn_id = aws_default
encrypt_s3_logs = False
s3_log_folder = '/airflow/logs/'
executor = CeleryExecutor

Same config file is setup when worker instance is initialized for DAG and only worker process is started on that node. Here is the log from a task when worker node is terminated.

*** Log file does not exist: /home/deploy/airflow/logs/XXXX/XXXXXX/2020-07-07T23:30:05+00:00/1.log
*** Fetching from: http://ip-10-164-62-253.ap-southeast-2.compute.internal:8799/log/XXXX/XXXXXX/2020-07-07T23:30:05+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='ip-10-164-62-253.ap-southeast-2.compute.internal', port=8799): Max retries exceeded with url: /log/xxxx/XXXXXX/2020-07-07T23:30:05+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6750ac52b0>: Failed to establish a new connection: [Errno 113] No route to host',))

So basically -

  1. This was working in airflow 1.10.1 (I did not need to add remote_logging=True)
  2. The logs are copied to S3 for EMR start and Worker Node start steps and are shown on web-UI.
  3. Only tasks running on remote worker node are not copied to S3.

Can someone please let me know what am I missing in configuration as same config used to work on airflow1.10.0

1

1 Answers

0
votes

I found the mistake I was doing. The S3 module that was getting installed on new worker node was being installed via pip and not pip3. Airflow server was having this installation from pip3. Another config change I had to do was in webserver section of airflow.cfg file.

worker_class = sync

This was previously gevent.