2
votes

We have enabled GA 360 export to Bigquery that creates a new table everyday. For further processing we need the daily export to be transferred to IBM cloud object storage.

The easiest solution would be to use CLI -

a) bigquery export to Google cloud storage

bq extract

and then

b) move the extract to IBM object storage

gsutil cp

Couple of questions in this regard -

1) Can we skip the google cloud storage in between and is it possible to export directly to IBM bucket ?

2) We have not used Data flows before. Can this use case be a candidate for using Data flows ? Or what can be the easiest solution natively within GCP ?

1
I will add this as a comment because @Pentium 10 provided a good answer. 1) Dataflow (Apache Beam) does not have an IBM Storage Connector. Therefore, this option is out unless you write your own connector which is not simple. 2) BigQuery does not support IBM Storage as an export destination. 3) Your question does not specify the size of the export data. There is a runtime limit for Cloud Functions that will cap (timeout) how much data it can move each time. You will need to test to find the limit.John Hanley
Thank you @John Hanley. These are very helpful comments. We are expecting the daily size to be less than a GB.John Smith
One GB of data transfer is not a problem for Cloud Functions. Go with @Pentium 10's answer.John Hanley
Cloud Functions can execute the export job asynchronously, no problem if timing out. Another Cloud Function will kick in when the file is ready on Google Cloud Storage. This is the ideal setup.Pentium10
@Pentium10 - I was referring to the movement of data from Cloud Storage to IBM Storage - part 3 of your answer. If that function times out the data transfer will fail.John Hanley

1 Answers

2
votes

The best option you have here, is to use the bq extract in a scheduled manner for that you can use Cloud Scheduler assembled in a Cloud Functions.

  1. Cloud Scheduler hits a Cloud Functions endpoint.
  2. GCF runs the extract job to GCS
  3. Another GCF observes the new file in a GCS bucket and notifies/uploads to IBM Cloud Storage.

It took us less than an hour to wire it up and get it working. As you have to setup essentially two cloud functions, and the scheduler.

Later this year, you will have a way to detect via Functions when a BigQuery table was created, so you could take out the scheduler and immediately as the table has been created, you can export it.

As an additional step you could delete the exported GCS file using another GCF triggered by http when you completed the move. And essentially you could also move the data into a partition to take advantage of the long term storage reduced pricing.