How to upload multiple files to google cloud storage bucket as a transaction

Question

Use Case:

Upload multiple files into a cloud storage bucket, and then use that data as a source to a bigquery import. Use the name of the bucket as the metadata to drive which sharded table the data should go into.

Question:

In order to prevent partial import to the bigquery table, ideally, I would like to do the following,

Upload the files into a staging bucket
Verify all files have been uploaded correctly
Rename the staging bucket to its final name (for example, gs://20130112)
Trigger the bigquery import to load the bucket into a sharded table

Since gsutil does not seem to support bucket rename, what are the alternative ways to accomplish this?

How about cloud copying the files from the staging bucket to final bucket? Aka gsutil cp /local/file gs://staging-bucket; gsutil cp gs://staging-bucket gs://final-bucket — fejta
Thanks. I think it's doing a copy then delete instead of rename. But yes, it should be a lot more reliable then copy from local. — user2020564

Mike Schwartz Mike Schwartz · Accepted Answer · 2013-01-30T17:04:39

Google Cloud Storage does not support renaming buckets, or more generally an atomic way to operate on more than one object at a time.

If your main concern is that all objects were uploaded correctly (as opposed to needing to ensure the bucket content is only visible once all objects are uploaded), gsutil cp supports that -- if any object fails to upload, it will report the number that failed to upload and exit with a non-zero status.

So, a possible implementation would be a script that runs gsutil cp to upload all your files, and then checks the gsutil exit status before creating the BigQuery table load job.

Mike Schwartz, Google Cloud Storage team

How to upload multiple files to google cloud storage bucket as a transaction

2 Answers