0
votes

I'm reading an .avro file where the data of a particular column is in binary format. I'm currently converting the binary format to string format with the help of UDF for a readable purpose and then finally i will need to convert it into JSON format for further parsing the data. Is there a way i can convert string object to JSON format using Spark Scala code.

Any help would be much appreciated.

val avroDF = spark.read.format("com.databricks.spark.avro").
load("file:///C:/46.avro")

import org.apache.spark.sql.functions.udf

// Convert byte object to String format

val toStringDF = udf((x: Array[Byte]) => new String(x))


val newDF = avroDF.withColumn("BODY", 
toStringDF(avroDF("body"))).select("BODY")

Output of newDF is shown below:

BODY                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+---------------------------------------------------------------------------------------------------------------+
|{"VIN":"FU74HZ501740XXXXX","MSG_TYPE":"SIGNAL","TT":0,"RPM":[{"E":1566800008672,"V":1073.75},{"E":1566800002538,"V":1003.625},{"E":1566800004084,"V":1121.75}

My desired output should be like below: enter image description here

1
isn't your body is already in json format ? If you are looking for converting that json string to proper data frame then this may help.Saurabh
Look at from_json to convert your JSON string to a proper dataframe (without having to re-read the data with spark.read.json())sachav

1 Answers

0
votes

I do not know if you want a generic solution but in your particular case, you can code something like this:

spark.read.json(newDF.as[String])
    .withColumn("RPM", explode(col("RPM")))
    .withColumn("E", col("RPM.E"))
    .withColumn("V", col("RPM.V"))
    .drop("RPM")
    .show()