0
votes

I need to copy files from FTP server into specific GCS location. I am using ftp_hook to download the file into /data folder. I need to move this file to a different GCS bucket instead of Composer GCS bucket.

I am trying to use GoogleCloudStorageToGoogleCloudStorageOperator operator to copy files from composer bucket to desired bucket. For that, I need the read the composer bucket in Airflow task. I don't want to add this as a custom Variable as my composer itself is created dynamically. So how can get information of the composer bucket in which my data folder resides?

1
Not sure where the bucket name is decide then? Where do you want to read it from? Do you want to enter it manually, or in an automated way? Can you use GCS API?bosnjak
I want to read it in automated way. I can use GCS API. But with GCS API, I can't check whether a bucket is for composer or notsag
Hey, I have just made an update in my answer, I think it will help you!Iñigo

1 Answers

3
votes

UPDATE:

I've just discovered (maybe it's something new) that you can access an Env Variable with the bucket. This is defined automatically in Composer.

COMPOSER_BUCKET = os.environ["GCS_BUCKET"]

ORIGINAL:

I'm not 100% sure if you want to do this dynamically (i.e., the same DAG would work in other Composer env without any modification), either way, this is what I thought of:

  • (No dynamically) You can check the bucket that Composer uses clicking in environment , it should be under "DAGs folder" (it actually is the folder where the DAGs are, just take out /dags)

  • (Dynamically) Since what you want is to copy files from Composer to GCS, you could use the FileToGoogleCloudStorageOperator and use file the is mapped to the Composer Bucket. Note that the local storage and Composer bucket are mapped to each other, so it would be "the same" to access path home/airflow/gcs/data/file1 than gs://<bucket>/data/file1.

  • (Semi-Dynamically) You can use the Composer API to get the environment details and parse the bucket. Of course, you will need to know the name, location and project beforehand.

Out of this three, I'd say the one that uses the FileToGoogleCloudStorageOperator is the cleanest and easiest.