Spark SQL : Handling schema evolution

Question

I want to read 2 avro files of same data set but with schema evolution

(Int field is undergone evolution to long) I want to read these two avro file to store in dataframe using sparkSQL.

To read avro files I am using 'spark-avro' of databicks https://github.com/databricks/spark-avro

How to do this efficiently.

Spark version : 2.0.1 Scala. 2.11.8

PS. Here in example I have mentioned only 2 files but in actual scenario file is generated daily so there are more than 1000 such file.

Thank you in advance:)

hlagos hlagos · Accepted Answer · 2017-08-11T03:27:05

use an union like

{string,string, [int, long]}

is a valid solution for your? it should allow read both new and old files.