I'm using Spark and Scala to read some parquet files. The problem I am facing is the content of this parquet files may vary, that is some fields sometimes are not present. So when I try to access a fields which doesn't exist in a file, I get the following exception:
java.lang.IllegalArgumentException: Field "wrongHeaderIndicator" does not exist.
I did something similar in Java once, and it was possible to use contains()
or get(index)!= null
to check if the field we are trying to access exists or not. But I am not able to do the same in Scala.
Below you can see what I have written so far and the four things I tried, without success.
//The part of reading the parquet file and accessing the rows works fine
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val parquetFileDF = sqlContext.read.parquet("myParquet.parquet')
//I get one of the six blocks in the parquet file
val myHeaderData = parquetFileDF.select("HeaderData").collectAsList()
//When I try to access a particular field which is not in the "HeaderData"
//I get the exception
//1st Try
Option(myHeaderData.get(0).getStruct(0).getAs[String]("wrongHeaderIndicator")) match {
case Some(i) => println("This data exist")
case None => println("This would be a null")
}
//2nd Try
if(myHeaderData.get(0).getStruct(0).getAs[String]("wrongHeaderIndicator")!= null)
println("This data exist")
else
println("This is null")
//3rd Try
println(myHeaderData.get(0).getStruct(0).fieldIndex("wrongHeaderIndicator"))
//4th Try
println(Some(myHeaderData.get(0).getStruct(0).getAs[String]("wrongHeaderIndicator")))
Edit. The problem is not when I access the columns of the DataFrame. The columns are always the same, and I don't need to perform checkings before the select. The problem come once I access the fields of the records in a particular column. Those records are structures which schema you can see below:
The schema of the column myHeaderData is similar to:
|-- myHeaderData: struct (nullable = true)
| |-- myOpIndicator: string (nullable = true)
| |-- mySecondaryFlag: string (nullable = true)
| |-- myDownloadDate: string (nullable = true)
| |-- myDownloadTime: string (nullable = true)
| |-- myUUID: string (nullable = true)
And if I run
myHeaderData.get(0).getStruct(0).schema
I get the following output:
StructType(StructField(myOpIndicator,StringType,true), StructField(mySecondaryFlag,StringType,true), StructField(myDownloadDate,StringType,true), StructField(myDownloadTime,StringType,true), StructField(myUUID,StringType,true))
The four things I tried produce the same exception. Can anyone tell me what can I use to check if a field exist in a structure without generating the Exception?
Thanks