2
votes

This is driving me nuts.

I'm setting up airflow in a cloud environment. I have one server running the scheduler and the webserver and one server as a celery worker, and I'm using airflow 1.8.0.

Running jobs works fine. What refuses to work is logging.

I've set up the correct path in airflow.cfg on both servers:

remote_base_log_folder = s3://my-bucket/airflow_logs/

remote_log_conn_id = s3_logging_conn

I've set up s3_logging_conn in the airflow UI, with the access key and the secret key as described here.

I checked the connection using

s3 = airflow.hooks.S3Hook('s3_logging_conn')

s3.load_string('test','test',bucket_name='my-bucket')

This works on both servers. So the connection is properly set up. Yet all I get whenever I run a task is

*** Log file isn't local.

*** Fetching here: http://*******

*** Failed to fetch log file from worker.

*** Reading remote logs...

Could not read logs from s3://my-bucket/airflow_logs/my-dag/my-task/2018-02-15T21:46:47.577537

I tried manually uploading the log following the expected conventions and the webserver still can't pick it up - so the problem is on both ends. I'm at a loss at what to do, everything I've read so far tells me this should be working. I'm close to just installing the 1.9.0 which I hear changes logging and see if I'm more lucky.

UPDATE: I made a clean install of Airflow 1.9 and followed the specific instructions here.

Webserver won't even start now with the following error:

airflow.exceptions.AirflowConfigException: section/key [core/remote_logging] not found in config

There is an explicit reference to this section in this config template.

So I tried removing it and just loading the S3 handler without checking first and I got the following error message instead:

Unable to load the config, contains a configuration error.

Traceback (most recent call last):

File "/usr/lib64/python3.6/logging/config.py", line 384, in resolve:

self.importer(used)

ModuleNotFoundError: No module named

'airflow.utils.log.logging_mixin.RedirectStdHandler';

'airflow.utils.log.logging_mixin' is not a package

I get the feeling that this shouldn't be this hard.

Any help would be much appreciated, cheers

1
I have now reinstalled everything, generated new credentials and upgraded to Airflow 1.9 and the problem persists.pedrogfp
Please update the logs with the errors from Airflow 1.9, it should work and some users are actually using this in production.Fokko Driesprong
Done, added the new errors.pedrogfp
Just a side note, the current template incubator-airflow/airflow/config_templates/airflow_local_settings.py present in master branch contains a reference to the class "airflow.utils.log.s3_task_handler.S3TaskHandler", which is not present in apache-airflow==1.9.0 python package. The fix is simple - use rather this base template: github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/… After having done that, follow all other instructions in the mentioned answer. Note that this tweak regards sdiogoa

1 Answers

2
votes

Solved:

  1. upgraded to 1.9
  2. ran the steps described in this comment
  3. added

    [core]

    remote_logging = True

    to airflow.cfg

  4. ran

    pip install --upgrade airflow[log]

Everything's working fine now.