3
votes

I need to setup a data pipeline from some source databases like Oracle, MySQL and load the data to BigQuery.

How can I use google-cloud-dataflow to read data from a database(jdbc connection) and write to BigQuery tables using Python.

Also, I have some hive tables in an on-premise Hadoop cluster, how do I transfer this data to BigQuery.

I couldn't find the right documentation or examples to achieve this. Can you please point me in the right direction.

1

1 Answers

0
votes

I applied a solution in my project to provide such thing, you need to follow these steps:

  1. Load data from Google Cloud SQL to Google Cloud storage in CSV by following this link.

  2. Load the CSV data from Google cloud storage directly into BigQuery by following this link.