0
votes

I have tried to send the json request and receive the json response and I am converting the json contents to Dataframe but the schema is string for all attributes.Is there anyway to apply my custom schema to the json response that is stored in datasets using scala and spark.

val inputStream = entity.getContent()
content = scala.io.Source.fromInputStream(inputStream).getLines.mkString
inputStream.close
}
httpClient.getConnectionManager().shutdown()
println(content)

val rootelem = "data"
var JsonDF : Dataframe = null;
if (rootelem.equalsIgnoreCase("NULL"))
{
jsonDF = sqlContext.read.json(Seq(content).toDS)

}

else{jsonDF=sqlContext.read.json(Seq(content).toDS).select(explode(col(rootelem)). as("child").select(col("child.*"))

jsonDF.show()
}} 
1
@Prateek Prateek this is for JSON file but in my case I am getting response directly from rest post API and I am saving it to dataframe dynamically but the data types are not matching.For file level custom schema is applicable but for getting response and storing it dynamically into dataframe is not matching the case as of JSON files do.Allforone

1 Answers

0
votes

Create your own StructType or have to create Struct type dynamically and then using your Json and structtype create the data frame. Example :

 val schema = StructType(
  List(
    StructField("num", IntegerType, true),
    StructField("letter", StringType, true)
  )
)

val df = spark.createDataFrame(
  spark.sparkContext.parallelize(data),
  schema
)

Please check this to create struct type dynamically Dynamically build case class or schema