0
votes

I want to ingest data from an api in stream to bigquery.

I guess that the best option is to use cloud dataflow in order to ingest this data into bigquery, but I don't know how to extract the data from the API: https://developer.tomtom.com/traffic-api

Can I extract the data in the same dataflow pipeline or I have to create an instance and extract the data from there to cloud PUB/SUB and then use dataflow to move this data to bigquery?

1
Please elaborate a bit more on the usecase. You can use the source directly with dataflow if it has a source connector for that data. Otherwise, it would be easiest to first get the data to pubsub and then use a dataflow pipeline to do what you need to do. - Ankur
Could you specify what is the API, you want to ingest data from? Is it also Google product or external one, such as AWS? - aga
Thanks for the sharing more details. Api requires query parameters. How are you planning to get those query parameters? - Ankur
I only need to know if I have to run it in a instance and inget it to a Pub/Sub subscription or I can get the data directly from a dataflow job - J.C Guzman

1 Answers

1
votes

my assumption is you have an api, from which you want to send data to bigquery. Since you cannot stream directly the API you have to hit on a batch interval it can be hourly or minute based on the API limitations.

You can have a job to read data from this API, and pump into PUB/SUB and use a data flow to pump data to BQ. Or you can use the job directly to pump data to BQ. it's up to your data volume/backup strategy and business requirements.