0
votes

I am running the following scala code:

val hiveContext=new org.apache.spark.sql.hive.HiveContex(sc)
val df=hiveContext.sql("SELECT * FROM hl7.all_index")
val rows=df.rdd
val firstStruct=rows.first.get(4)
//I know the column with index 4 IS a StructType
val fs=firstStruct.asInstanceOf[StructType]
//now it fails
//what I'm trying to achieve is
log.println(fs.apply("name"))

I know that firstStruct is of structType and that one of the StructFields' name is "name" but it seems to fail when trying to cast I've been told that spark/hive structs differ from scala, but, in order to use StructType I needed to

import org.apache.spark.sql.types._

so I assume they actually should be the same type

I looked here: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala

in order to see how to get to the structField.

Thanks!

1
What exactly is your question? It seems unclear - Kevin Voorn

1 Answers

0
votes

Schema types are logical types. They don't map one-to-one to the type of objects from the column with with that schema type.

For example, Hive/SQL use BIGINT for 64 bit integers while SparkSQL uses LongType. The actual type of the data in Scala is Long. This is the issue you are having.

A struct in Hive (StructType in SparkSQL) is represented by Row in a dataframe. So, what you want to do is one of the following:

row.getStruct(4)

import org.apache.spark.sql.Row
row.getAs[Row](4)