1
votes

I've read the documentation for creating an Airflow Connection via an environment variable and am using Airflow v1.10.6 with Python3.5 on Debian9.

The linked documentation above shows an example S3 connection of s3://accesskey:secretkey@S3 From that, I defined the following environment variable:

AIRFLOW_CONN_AWS_S3=s3://#MY_ACCESS_KEY#:#MY_SECRET_ACCESS_KEY#@S3

And the following function

def download_file_from_S3_with_hook(key, bucket_name):
    """Get file contents from S3"""
    hook = airflow.hooks.S3_hook.S3Hook('aws_s3')
    obj = hook.get_key(key, bucket_name)
    contents = obj.get()['Body'].read().decode('utf-8')
    return contents

However, when I invoke that function I get the following error:

Using connection to: id: aws_s3.
    Host: #MY_ACCESS_KEY#,
    Port: None,
    Schema: #MY_SECRET_ACCESS_KEY#,
    Login: None,
    Password: None,
    extra: {}
ERROR - Unable to locate credentials

It appears that when I format the URI according to Airflow's documentation it's sitting the access key as the host and the secret access key as the schema.

It's clearly reading the environment variable as it has the correct conn_id. It also has the correct values for my access key and secret, it's just parcing it under the wrong field.

When I set the connection in the UI, the function works if I set Login to my access key and Password to my token. So how am I formatting my environment variable URI wrong?

1

1 Answers

6
votes

Found the issue, s3://accesskey:secretkey@S3 is the correct format, the problem was my aws_secret_access_key had a special character in it and had to be urlencoded. That fixed everything.