1
votes

I have a directory in which I have 2 parquet files with same schema but columns order are different I want to know how spark decides column order when reading the directory

Input directory enter image description here

Dataframe 1 while reading 1.parquet file

enter image description here

Dataframe 2 while reading 2.parquet file

enter image description here

When reading complete directory enter image description here

1

1 Answers

1
votes

Column order depend of schema metadata , you can use a parquet viewer to inspect each file.

You can also provide a schema when reading parquet file to get all the time the same columns order.

val parquetSchema: Structype = new structype()
.add("id",IntegerType,true)
.add("login",StringType,true)

spark.read.schema(parquetSchema).parquet(...)