0
votes

What I want to achieve: Everytime i upload a file to Cloud Storage, it will automatically be write append to BigQuery.

What I have worked on: Cloud storage triggers cloud function is working fine. Dataflow template Cloud Storage to Big Query is working fine.

However they don't work together. How can I have the cloud functions trigger to activate the dataflow part? Is it a script or just some settings I have missed out?

Something is missing to connect them all. I'm relatively new to this, I've been searching for tutorials on this but I don't know if I have been searching for the wrong keywords, i can't find any relevant tutorials.

2

2 Answers

0
votes

If I understood your question correctly, your desired workflow will be:

Cloud Function ---> Google Cloud Storage (GCS) ---> Cloud Dataflow ---> BigQuery

And, the most important part for you would be how to trigger a Dataflow job when a new file is written in GCS. There isn't such a feature in GCS to do so. You will need to orchestrate this somehow.

For that, you can add another Cloud Function that will be triggered on a new file creation in GCS bucket. And, in that Cloud Function you just start a Dataflow pipeline.

The architecture will end up in something like:

Cloud Function 1 ---> GCS ---> Cloud Function 2 ---> Dataflow job ---> BigQuery

Where Cloud Function 1 is your current Cloud function, and Cloud Function 2 is the new one that is trigger when a new file arrives to a bucket and launch your Dataflow Job.

I'd like to mention that you can avoid the creation of the second Cloud Function and the Dataflow Job if you instead of having a native table in BigQuery, you choose to have an External Table in GCS. It has pros and cons. Depending on your case might be a good idea.

0
votes

There is a REST API for Dataflow and I think this will be the option for you.

When you take a look on examples in the documentation you can choose between console, gcloud and API in each example. So if you have currently working template and you are running it using console or gcloud you have just to convert it to POST request like shown in API.

You can test the request in appropriate tools (for ex. using POSTMAN). When you will have working POST request than you have to create Cloud Function in your preferred language were you will create the request. For nodejs you can use this, for Python this and I am sure you will be able to google tones of examples for any language to choose in Cloud Function.