6
votes

I am trying to setup a Google Cloud Platform connection in Google Cloud Composer using the service account key. So I created a GCS bucket and put the service account key file in the bucket. The key is stored in JSON. In the keyfile path field I specified a GCS bucket, and in the keyfile JSON field I specified the file name. The scopes is https://www.googleapis.com/auth/cloud-platform.

When trying to use this connection to start a Dataproc cluster, I got the error that JSON file can not be found.

Looking at the error message, the code tries to parse the file using: with open(filename, 'r') as file_obj which obviously won't work with a GCS bucket path.

So my question is, where should I put this service account key file if it can not be put in a GCS path?

6
So you use Cloud Composer and ssh into the machines to save the files ?Charles Zhan

6 Answers

4
votes

I'm assuming you want your operators using a service account distinct from the default auto-generated compute account that Composer runs under.

The docs indicate that you can add a new Airflow Connection for the service account, which includes copy-pasting the entire JSON key file into the Airflow Connection config (look for Keyfile JSON once you select the Google Cloud Platform connection type).

2
votes

This doesn't make much sense to me. I don't use Google Cloud so maybe just my lack of knowledge here:

If you're trying to set up a connection to GCP how can you store your credentials inside GCP and expect to connect from your airflow server? Chicken and egg thing.

Looking at the gcp_api_base_hook.py in the airflow repo it looks like it is expecting you to specify key_path and / or a keyfile_dict in the extra json properties of the connection and the logic to how to connect is here

1
votes

Add the following to your Extra's field:

'{"extra__google_cloud_platform__scope":"https://www.googleapis.com/auth/cloud-platform", "extra__google_cloud_platform__project":"{GOOGLE_CLOUD_PROJECT}", "extra__google_cloud_platform__key_path":"/path/to/gce-key.json"}'
0
votes

Cloud Composer should set up a default connection for you that doesn't require you specify the JSON key. For me it worked for GCS and BigQuery without doing any additional work.

If you create your own service account, then copy the JSON key to the composer bucket that gets created. That file/path is what you'll use in the extras field. I think Composer prefixes the file system using a gs: or gcs: mount point. There should be a reference to it in the Airflow.cfg file that's in the bucket.

I don't have one spun up right this moment to tell you for certain so I'm working from memory.

0
votes

Since Cloud Composer live in a GKE Cluster, you could set your service account as a kubernetes secret, then you should be able to use it in conjunction with Kubernetes Operator

0
votes

Composer instance creates a GCS bucket where all the dags and plugins are stored. You need to keep the json file in data folder and then give the path as the mapped location ex '/home/airflow/gcs/data/<service.json>' . For more details refer to link - https://cloud.google.com/composer/docs/concepts/cloud-storage