I am getting data from a blob location into a dataframe as below.
| NUM_ID| Event|
+-------+-----------------------------------------------------------------------------------------------------------------------------------+
|XXXXX01|[{"SN":"SIG1","E":1571599398000,"V":19.79},{"SN":"SIG1","E":1571599406000,"V":19.80},{"SN":"SIG2","E":1571599406000,"V":25.30},{...|
|XXXXX02|[{"SN":"SIG1","E":1571599414000,"V":19.79},{"SN":"SIG2","E":1571599414000,"V":19.80},{"SN":"SIG2","E":1571599424000,"V":25.30},{...|
If we take a single row it will be as below.
|XXXXX01|[{"SN":"SIG1","E":1571599398000,"V":19.79},{"SN":"SIG1","E":1571599406000,"V":19.80},{"SN":"SIG1","E":1571599414000,"V":19.20},{"SN":"SIG2","E":1571599424000,"V":25.30},{"SN":"SIG2","E":1571599432000,"V":19.10},{"SN":"SIG3","E":1571599440000,"V":19.10},{"SN":"SIG3","E":1571599448000,"V":19.10},{"SN":"SIG3","E":1571599456000,"V":19.10},{"SN":"SIG3","E":1571599396000,"V":19.79},{"SN":"SIG3","E":1571599404000,"V":19.79}]
Event column is having different signals as E,V pair.
The schema for this dataframe is as shown below.
scala> df.printSchema
root
|-- NUM_ID: string (nullable = true)
|-- Event: string (nullable = true)
I want to take some signals(suppose i need only SIG1 and SIG3) along with E,V pairs as a new column as shown below.
+-------+--------+--------------+------+
| NUM_ID| Event| E| V|
+-------+--------+--------------+------+
|XXXXX01| SIG1| 1571599398000| 19.79|
|XXXXX01| SIG1| 1571599406000| 19.80|
|XXXXX01| SIG1| 1571599414000| 19.20|
|XXXXX01| SIG3| 1571599440000| 19.10|
|XXXXX01| SIG3| 1571599448000| 19.10|
|XXXXX01| SIG3| 1571599406000| 19.10|
|XXXXX01| SIG3| 1571599396000| 19.70|
|XXXXX01| SIG3| 1571599404000| 19.70|
+-------+--------+--------------+------+
and the final output should be as like below for each NUM_ID.
+-------+--------------+------+------+
| NUM_ID| E|SIG1 V|SIG3 V|
+-------+--------------+------+------+
|XXXXX01| 1571599398000| 19.79| null|
|XXXXX01| 1571599406000| 19.80| 19.70|
|XXXXX01| 1571599414000| 19.20| null|
|XXXXX01| 1571599440000| null| 19.10|
|XXXXX01| 1571599448000| null| 19.10|
|XXXXX01| 1571599448000| null| 19.10|
|XXXXX01| 1571599406000| 19.80| 19.10|
|XXXXX01| 1571599396000| null| 19.70|
|XXXXX01| 1571599404000| null| 19.70|
+-------+--------------+------+------+
Appreciate any leads. Thanks in Advance!
val schema = ArrayType(StructType(Seq(StructField("SN", StringType), StructField("E", StringType), StructField("V", StringType))))df.withColumn("sig_array", from_json($"Event", schema))And obtaining [SN, E, V ] as a new column, now trying to explode that column! - Antonyfrom_jsonif json has a predefined schema orget_json_objectif you dont know the schema as shown here - abiratsis