0
votes

I need to migrate 70TB data (2400 tables) from on-premises Hive to BigQuery. Initial plan is to load ORC files from Hive to Cloud Storage and then to BigQuery tables. What is a better way achieving this through automation or any other GCP service.

1

1 Answers

0
votes

I would suggest you to leverage data pipelines for the stated purpose. Here’s some reference on how to use it - https://cloud.google.com/architecture/dw2bq/dw-bq-data-pipelines#what-is-a-data-pipeline

Also, you can explore different ways to transfer your on prem data to bigquery here - https://cloud.google.com/architecture/dw2bq/dw-bq-migration-overview

And please note that in Big query ORC is not supported. So you have to convert your ORC data into one of these 3 formats - Avro, JSON, CSV.