3
votes

Use Case:

Upload multiple files into a cloud storage bucket, and then use that data as a source to a bigquery import. Use the name of the bucket as the metadata to drive which sharded table the data should go into.

Question:

In order to prevent partial import to the bigquery table, ideally, I would like to do the following,

  • Upload the files into a staging bucket
  • Verify all files have been uploaded correctly
  • Rename the staging bucket to its final name (for example, gs://20130112)
  • Trigger the bigquery import to load the bucket into a sharded table

Since gsutil does not seem to support bucket rename, what are the alternative ways to accomplish this?

2
How about cloud copying the files from the staging bucket to final bucket? Aka gsutil cp /local/file gs://staging-bucket; gsutil cp gs://staging-bucket gs://final-bucket - fejta
Thanks. I think it's doing a copy then delete instead of rename. But yes, it should be a lot more reliable then copy from local. - user2020564

2 Answers

5
votes

Google Cloud Storage does not support renaming buckets, or more generally an atomic way to operate on more than one object at a time.

If your main concern is that all objects were uploaded correctly (as opposed to needing to ensure the bucket content is only visible once all objects are uploaded), gsutil cp supports that -- if any object fails to upload, it will report the number that failed to upload and exit with a non-zero status.

So, a possible implementation would be a script that runs gsutil cp to upload all your files, and then checks the gsutil exit status before creating the BigQuery table load job.

Mike Schwartz, Google Cloud Storage team

2
votes

Object names are actually flat in Google Cloud Storage; from the service's perspective, '/' is just another character in the name. The folder abstraction is provided by clients, like gsutil and various GUI tools. Renaming a folder requires clients to request a sequence of copy and delete operations on each object in the folder. There is no atomic way to rename a folder.

Mike Schwartz, Google Cloud Storage team