Problem: I want to copy files from a folder in Google Cloud Storage Bucket (e.g Folder1 in Bucket1) to another Bucket (e.g Bucket2). I can't find any Airflow Operator for Google Cloud Storage to copy files.
3 Answers
I just found a new operator in contrib uploaded 2 hours ago: https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/gcs_to_gcs.py called GoogleCloudStorageToGoogleCloudStorageOperator
that should copy an object from a bucket to another, with renaming if requested.
I know this is an old question but I found myself dealing with this task too. Since I'm using the Google Cloud-Composer, GoogleCloudStorageToGoogleCloudStorageOperator
was not available in the current version.
I managed to solve this issue by using a simple BashOperator
from airflow.operators.bash_operator import BashOperator
with models.DAG(
dag_name,
schedule_interval=timedelta(days=1),
default_args=default_dag_args) as dag:
copy_files = BashOperator(
task_id='copy_files',
bash_command='gsutil -m cp <Source Bucket> <Destination Bucket>'
)
Is very straightforward, can create folders if you need and rename your files.
You can use GoogleCloudStorageToGoogleCloudStorageOperator
The below code is moving all the files from source bucket to destination.
Package: https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/gcs_to_gcs/index.html
backup_file = GoogleCloudStorageToGoogleCloudStorageOperator(
task_id='Move_File_to_backupBucket',
source_bucket='adjust_data_03sept2020',
source_object='*.csv',
destination_bucket='adjust_data_03sept2020_backup',
move_object=True,
google_cloud_storage_conn_id='connection_name',
dag=dag
)