0
votes

I want to read 2 avro files of same data set but with schema evolution

  1. first avro file schema : {String, String, Int}
  2. second avro file schema evolution : {String, String, Long}

(Int field is undergone evolution to long) I want to read these two avro file to store in dataframe using sparkSQL.

To read avro files I am using 'spark-avro' of databicks https://github.com/databricks/spark-avro

How to do this efficiently.

Spark version : 2.0.1 Scala. 2.11.8

PS. Here in example I have mentioned only 2 files but in actual scenario file is generated daily so there are more than 1000 such file.

Thank you in advance:)

1

1 Answers

0
votes

use an union like

{string,string, [int, long]} 

is a valid solution for your? it should allow read both new and old files.