I have parquet files which I need to read from spark. Some files have few columns missing which are present in new files.
Since I do not know which files have column missing, I need to read all the files in spark. I have list of columns that I need to read. It may also be the case that all the files may have some column missing. I need to put a null in those columns which are missing.
When I try to do a
sqlContext.sql('query')
it gives me error saying that columns are missing
If I define the schema and do a
sqlContext.read.parquet('s3://....').schema(parquet_schema)
It gives me the same error.
Help me here