I'm trying to create an external table in Hive and another in BigQuery using the same data stored in Google Storage in Avro format wrote with Spark.
I'm using a Dataproc cluster with Spark 2.2.0, Spark-avro 4.0.0 and Hive 2.1.1
There are same differences between Avro versions/packages but If I create the table using Hive and then I write the files using Spark, I'm able to see them in Hive.
But for BigQuery is different, it is able to read Hive Avro files but NOT Spark Avro files.
Error:
The Apache Avro library failed to parse the header with the follwing error: Invalid namespace: .someField
Searching a little about the error, the problem is that Spark Avro files are different from Hive/BigQuery Avro files.
I don't know exactly how to fix this, maybe using different Avro package in Spark, but I haven't found which one is compatible with all the systems.
Also I would like to avoid tricky solutions like create a temporary table in Hive and create another using insert into ... select * from ...
I'll write a lot of data and I would like to avoid this kind of solutions
Any help would be appreciated. Thanks