I have a dataframe with columns col1,col2,col3. col1,col2 are strings. col3 is a Map[String,String] defined below
|-- col3: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
I have grouped by col1,col2 and aggregated using collect_list to get an Array of Maps and stored in col4.
df.groupBy($"col1", $"col2").agg(collect_list($"col3").as("col4"))
|-- col4: array (nullable = true)
| |-- element: map (containsNull = true)
| | |-- key: string
| | |-- value: string (valueContainsNull = true)
However I would like to get col4 as a single map with all the maps combined. Currently I have:
[[a->a1,b->b1],[c->c1]]
Expected output
[a->a1,b->b1,c->c1]
Using an udf would be ideal?
Any help is appreciated. Thanks.