I've read the documentation for creating an Airflow Connection via an environment variable and am using Airflow v1.10.6 with Python3.5 on Debian9.
The linked documentation above shows an example S3 connection of s3://accesskey:secretkey@S3 From that, I defined the following environment variable:
AIRFLOW_CONN_AWS_S3=s3://#MY_ACCESS_KEY#:#MY_SECRET_ACCESS_KEY#@S3
And the following function
def download_file_from_S3_with_hook(key, bucket_name):
"""Get file contents from S3"""
hook = airflow.hooks.S3_hook.S3Hook('aws_s3')
obj = hook.get_key(key, bucket_name)
contents = obj.get()['Body'].read().decode('utf-8')
return contents
However, when I invoke that function I get the following error:
Using connection to: id: aws_s3.
Host: #MY_ACCESS_KEY#,
Port: None,
Schema: #MY_SECRET_ACCESS_KEY#,
Login: None,
Password: None,
extra: {}
ERROR - Unable to locate credentials
It appears that when I format the URI according to Airflow's documentation it's sitting the access key as the host and the secret access key as the schema.
It's clearly reading the environment variable as it has the correct conn_id. It also has the correct values for my access key and secret, it's just parcing it under the wrong field.
When I set the connection in the UI, the function works if I set Login to my access key and Password to my token. So how am I formatting my environment variable URI wrong?