3
votes

I have Airflow running with CeleryExecutor and 2 workers. When my DAG runs, the tasks generate a log on the filesystem of the worker that ran them. But when I go to the Web UI and click on the task logs, I get:

*** Log file does not exist: /usr/local/airflow/logs/test_dag/task2/2019-11-01T18:12:16.309655+00:00/1.log
*** Fetching from: http://70953abf1c10:8793/log/test_dag/task2/2019-11-01T18:12:16.309655+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='70953abf1c10', port=8793): Max retries exceeded with url: /log/test_dag/task2/2019-11-01T18:12:16.309655+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f329c3a2650>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

http://70953abf1c10:8793/ is obviously not the correct IP of the worker. However, celery@70953abf1c10 is the name of this worker in Celery. It seems like Airflow is trying to learn the worker's URL from Celery, but Celery is giving the worker's name instead. How can I solve this?

2
This is actually two separate errors. The first error is it cannot find the log file. The second error is when it then attempts to fetch the log file over HTTP. The answers so far attempt to fix the second problem. But it is probably easier to fix the first problem by mounting a shared docker volume for each container. See stackoverflow.com/a/67741414/1004759James James

2 Answers

2
votes

DejanLekic's solution put me on the right track, but it wasn't entirely obvious, so I'm adding this answer to clarify.

In my case I was running Airflow on Docker containers. By default, Docker containers use a bridge network called bridge. This is a special network that does not automatically resolve hostnames. I created a new bridge network in Docker called airflow-net and had all my Airflow containers join this one (leaving the default bridge was not necessary). Then everything just worked.

By default, Docker sets the hostname to the hex ID of the container. In my case the container ID began with 70953abf1c10 and the hostname was also 70953abf1c10. There is a Docker parameter for specifying hostname, but it turned out to not be necessary. After I connected the containers to a new bridge network, 70953abf1c10 began to resolve to that container.

1
votes

Simplest solution is either to use the default name, which will include the hostname, or to explicitly set the node name that has a valid host name in it (example: [email protected]).

If you use the default settings, then machine running the airflow worker has incorrectly set hostname to 70953abf1c10. You should fix this by running something like: hostname -B hostname.domain.tld