5
votes

I am trying to read json data in spark streaming job. By default sqlContext.read.json(rdd) is converting all map types to struct types.

|-- legal_name: struct (nullable = true)
 |    |-- first_name: string (nullable = true)
 |    |-- last_name: string (nullable = true)
 |    |-- middle_name: string (nullable = true)

But when i read from hive table using sqlContext

val a = sqlContext.sql("select * from student_record")

below is the schema.

|-- leagalname: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

Is there any way we can read data using read.json(rdd) and get Map data type?

Is there any option like spark.sql.schema.convertStructToMap?

Any help is appreciated.

1

1 Answers

2
votes

You need to explicitly define your schema, when calling read.json.

You can read about the details in Programmatically specifying the schema in the Spark SQL Documentation.

For example in your specific case it would be

import org.apache.spark.sql.types._
val schema = StructType(List(StructField("legal_name",MapType(StringType,StringType,true))))

That would be one column legal_name being a map.

When you have defined you schema you can call sqlContext.read.json(rdd, schema) to create your data frame from your JSON dataset with the desired schema.