One of the Json field (age below) meant to be a number represented as null is coming up as string in Dataframe printschema
input json file
{"AGE":null,"NAME":"abc","BATCH":190}
{"AGE":null,"NAME":"abc","BATCH":190}
Spark Code and output
val df = spark.read.json("/home/white/tmp/a.json")
df.printSchema()
df.show()
*********************
OUTPUT
*********************
root
|-- BATCH: long (nullable = true)
|-- AGE: string (nullable = true)
|-- NAME: string (nullable = true)
+-----+----+----+
|BATCH|AGE|NAME|
+-----+----+----+
| 190|null| abc|
| 190|null| abc|
+-----+----+----+
I want age to be a long and currently I am achieving this by creating a new StructType with age field as Long and recreating the Dataframe as df.sqlContext.createDataFrame( df.rdd, newSchema ). Can I get this done while spark.read.json api directly?