We are doing the data transformations using Google Dataproc and all our data is residing in Dataproc Hive tables. How do i transfer/move this data to BigQuery.
1
votes
Hello and welcome to Stackoverflow. Please read through How To Ask so it's easier to help you. Have you tried something so far? Do you have any code? Are you seeing any traceback error? The more information you give the more likely you are to get help.
– Willian Fuks
We are in the process of evaluating the GCP cloud offerings. Our use case is to use Dataproc for our Hive/Spark related jobs and dump final data from Hive tables to BigQuery. We see that Google is offering Bigquery connector to transfer data from Hive to BigQuery, i just wanted to check what would be the ideal way of moving the data from Dataproc to BigQuery.
– bigdata
1 Answers
0
votes
Transfer to BigQuery from Hive seems to have a standard pattern:
- dump your Hive into Avro files
- Load those files in BigQuery
See an example here: Migrate hive table to Google BigQuery
As mentioned above, take care about the types compatibility between Hive/Avro/BigQuery.
And for the first time I guess it would not hurt to do some validations by comparing that the tables on both Hive and BigQuery have the same data: https://github.com/bolcom/hive_compared_bq