We are working with apache spark, we save json files as gzip-compressed parquet files in hdfs. However, when reading them back to generate a dataframe, some files (but not all) give rise to the following exception:
ERROR Executor: Exception in task 2.0 in stage 72.0 (TID 88)
org.apache.parquet.io.ParquetDecodingException: Can not read value at 351 in
block 0 in file file:/path/to/file [...]
Caused by: java.lang.ClassCastException:
org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to
org.apache.spark.sql.catalyst.expressions.MutableDouble
Any help is much appreciated!