I have a parquet file that is read by spark as an external table.
One of the columns is defined as int both in the parquet schema and in the spark table.
Recently, I've discovered int is too small for my needs, so I changed the column type to long in new parquet files. I changed also the type in the spark table to bigint.
However, when I try to read an old parquet file (with int) by spark as external table (with bigint), I get the following error:
java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
One possible solution is altering the column type in the old parquet to long, which I asked about here: How can I change parquet column type from int to long?, but it is very expensive since I have a lot of data.
Another possible solution is to read each parquet file according to its schema to a different spark table and create a union view of the old and new tables, which is very ugly.
Is there another way to read from parquet an int column as long in spark?