spark error reading parquet

Question

We are working with apache spark, we save json files as gzip-compressed parquet files in hdfs. However, when reading them back to generate a dataframe, some files (but not all) give rise to the following exception:

ERROR Executor: Exception in task 2.0 in stage 72.0 (TID 88)
org.apache.parquet.io.ParquetDecodingException: Can not read value at 351 in 
block 0 in file file:/path/to/file [...]
Caused by: java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to 
org.apache.spark.sql.catalyst.expressions.MutableDouble

Any help is much appreciated!

Thirubalan Thirubalan · Accepted Answer · 2018-08-21T14:33:36

This kind of error will occur when you try to simultaneously read parquet file which has different schema. Try to have /convert all your source file have the same schema or by converting all of them at the same time.

spark error reading parquet

1 Answers