1
votes

Problem: I want to copy files from a folder in Google Cloud Storage Bucket (e.g Folder1 in Bucket1) to another Bucket (e.g Bucket2). I can't find any Airflow Operator for Google Cloud Storage to copy files.

3

3 Answers

1
votes

I just found a new operator in contrib uploaded 2 hours ago: https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/gcs_to_gcs.py called GoogleCloudStorageToGoogleCloudStorageOperator that should copy an object from a bucket to another, with renaming if requested.

0
votes

I know this is an old question but I found myself dealing with this task too. Since I'm using the Google Cloud-Composer, GoogleCloudStorageToGoogleCloudStorageOperator was not available in the current version. I managed to solve this issue by using a simple BashOperator

    from airflow.operators.bash_operator import BashOperator

with models.DAG(
            dag_name,
            schedule_interval=timedelta(days=1),
            default_args=default_dag_args) as dag:

        copy_files = BashOperator(
            task_id='copy_files',
            bash_command='gsutil -m cp <Source Bucket> <Destination Bucket>'
        )

Is very straightforward, can create folders if you need and rename your files.

0
votes

You can use GoogleCloudStorageToGoogleCloudStorageOperator

The below code is moving all the files from source bucket to destination.

Package: https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/gcs_to_gcs/index.html

backup_file = GoogleCloudStorageToGoogleCloudStorageOperator(
    task_id='Move_File_to_backupBucket',
    source_bucket='adjust_data_03sept2020',
    source_object='*.csv',
    destination_bucket='adjust_data_03sept2020_backup',
    move_object=True,
    google_cloud_storage_conn_id='connection_name',
    dag=dag
)