0
votes

I want to create beam dataflow job to load data from GCS to Bigquery, I will have 100s of files from different folders in GCS in Parquet format, is it possible to load files from different folders in GCS and is it possible to create source dataset and tables in the beam code itself.

My end goal is to create pipeline to load data from GCS to Bigquery thanks in advance.

2

2 Answers

0
votes

Yes, this is a perfect fit for Dataflow. You can use FileIO to read from GCS and BigQueryIO to write to BigQuery.

0
votes

An alternate solution, You can use gsutil to move all files from different GCS folders to one single folder. Then once you have all files in a single folder over GCS then you can easily Read data from GCS and Load it to BigQuery.