New to Airflow. I am trying to save result to a file in another bucket (not the airflow one). I could save to a file in '/home/airflow/gcs/data/test.json', then use gcs_hook.GoogleCloudStorageHook to copy to another bucket. Here is the code:
def write_file_func(**context):
file = f'/home/airflow/gcs/data/test.json'
with open(file, 'w') as f:
f.write(json.dumps('{"name":"aaa", "age":"10"}'))
def upload_file_func(**context):
conn = gcs_hook.GoogleCloudStorageHook()
source_bucket = 'source_bucket'
source_object = 'data/test.json'
target_bucket = 'target_bucket'
target_object = 'test.json'
conn.copy(source_bucket, source_object, target_bucket, target_object)
conn.delete(source_bucket, source_object)
My questions are:
Can we directly write to a file at the target bucket? I didn't find any method in the gcs_hook.
I tried to use google.cloud.storage bucket.blob('test.json').upload_from_string(), but the airflow keeps saying "The DAG isn't available in the server's DAGBag", very annoying, are we not allowed to use that API in the DAG?
If we can use google.cloud.storage/bigquery API directly, what's the difference between that and Airflow API, like the gcs_hook/bigquery_hook?
Thanks