I use Google Cloud Dataflow implementation in Python on Google Cloud Platform. My idea is to use input from AWS S3.
Google Cloud Dataflow (which is based on Apache Beam) supports reading files from S3. However, I cannot find in documentation the best possiblity to pass credentials to a job. I tried adding AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to environment variables within setup.py file. However, it work locally, but when I package Cloud Dataflow job as a template and trigger it to run on GCP, it sometimes work, and sometimes not, raising "NoCredentialsError" exception and causing job to fail.
Is there any coherent, best-practice solution to pass AWS credentials to Python Google Cloud Dataflow job on GCP?
awsAccessKey
andawsSecretKey
flags? - Alexandre Moraes