I am reading schema of the data frame from a text file. The file looks like
id,1,bigint
price,2,bigint
sqft,3,bigint
zip_id,4,int
name,5,string
and I am mapping parsed data types to Spark Sql datatypes.The code for creating data frame is -
var schemaSt = new ListBuffer[(String,String)]()
// read schema from file
for (line <- Source.fromFile("meta.txt").getLines()) {
val word = line.split(",")
schemaSt += ((word(0),word(2)))
}
// map datatypes
val types = Map("int" -> IntegerType, "bigint" -> LongType)
.withDefault(_ => StringType)
val schemaChanged = schemaSt.map(x => (x._1,types(x._2))
// read data source
val lines = spark.sparkContext.textFile("data source path")
val fields = schemaChanged.map(x => StructField(x._1, x._2, nullable = true)).toList
val schema = StructType(fields)
val rowRDD = lines
.map(_.split("\t"))
.map(attributes => Row.fromSeq(attributes))
// Apply the schema to the RDD
val new_df = spark.createDataFrame(rowRDD, schema)
new_df.show(5)
new_df.printSchema()
but the above works only for StringType. For IntegerType and LongType, it is throwing exceptions -
java.lang.RuntimeException: java.lang.String is not a valid external type for schema of int
and
java.lang.RuntimeException: java.lang.String is not a valid external type for schema of bigint.
Thanks in advance!