I deployed Airflow webserver, scheduler, worker, and flower on my kubernetes cluster using Docker images. Airflow version is 1.8.0.
Now I want to send worker logs to S3 and
- Create S3 connection of Airflow from Admin UI (Just set
S3_CONN
as conn id,s3
as type. Because my kubernetes cluster is running on AWS and all nodes have S3 access roles, it should be sufficient) - Set Airflow config as follows
remote_base_log_folder = s3://aws-logs-xxxxxxxx-us-east-1/k8s-airflow
remote_log_conn_id = S3_CONN
encrypt_s3_logs = False
and first I tried creating a DAG so that it just raises an exception immediately after it's running. This works, log can be seen on S3.
So I modified so that the DAG now creates an EMR cluster and waits for it to be ready (waiting status). To do this, I restarted all 4 docker containers of airflow.
Now the DAG looks working, a cluster is started and once it's ready, DAG marked as success. But I could see no logs on S3.
There is no related error log on worker and web server, so I even cannot see what may cause this issue. The log just not sent.
Does anyone know if there is some restriction for remote logging of Airflow, other than this description in the official documentation? https://airflow.incubator.apache.org/configuration.html#logs
In the Airflow Web UI, local logs take precedence over remote logs. If local logs can not be found or accessed, the remote logs will be displayed. Note that logs are only sent to remote storage once a task completes (including failure). In other words, remote logs for running tasks are unavailable.
I didn't expect it but on success, will the logs not be sent to remote storage?