2
votes

I deployed Airflow webserver, scheduler, worker, and flower on my kubernetes cluster using Docker images. Airflow version is 1.8.0.

Now I want to send worker logs to S3 and

  1. Create S3 connection of Airflow from Admin UI (Just set S3_CONN as conn id, s3 as type. Because my kubernetes cluster is running on AWS and all nodes have S3 access roles, it should be sufficient)
  2. Set Airflow config as follows remote_base_log_folder = s3://aws-logs-xxxxxxxx-us-east-1/k8s-airflow remote_log_conn_id = S3_CONN encrypt_s3_logs = False

and first I tried creating a DAG so that it just raises an exception immediately after it's running. This works, log can be seen on S3.

So I modified so that the DAG now creates an EMR cluster and waits for it to be ready (waiting status). To do this, I restarted all 4 docker containers of airflow.

Now the DAG looks working, a cluster is started and once it's ready, DAG marked as success. But I could see no logs on S3.

There is no related error log on worker and web server, so I even cannot see what may cause this issue. The log just not sent.

Does anyone know if there is some restriction for remote logging of Airflow, other than this description in the official documentation? https://airflow.incubator.apache.org/configuration.html#logs

In the Airflow Web UI, local logs take precedence over remote logs. If local logs can not be found or accessed, the remote logs will be displayed. Note that logs are only sent to remote storage once a task completes (including failure). In other words, remote logs for running tasks are unavailable.

I didn't expect it but on success, will the logs not be sent to remote storage?

1

1 Answers

1
votes

The boto version that is installed with airflow is 2.46.1 and that version doesn't use iam instance roles.

Instead, you will have to add an access key and secret for an IAM user that has access in the extra field of your S3_CONN configuration

Like so: {"aws_access_key_id":"123456789","aws_secret_access_key":"secret12345"}