My current issue is the following one...
Exception in thread "main" org.apache.spark.sql.AnalysisException: expression 'mapField' cannot be used as a grouping expression because its data type map<string,string> is not an orderable data type.;;
What I'm trying to achieve is just basically group entries within a DataFrame by a given set of columns, but seems to be failing when grouping with MapType columns such as the previously mentioned.
.groupBy(
...
"mapField",
...
)
I've got a couple of ideas but there must be a way easier solution to this problem rather than the following ones that I've thought about...
I've got the key, value of each of the elements saved in a concatenated string within the DF, so I could maybe parse those into a Map and then save it using
withColumn, but haven't found any approach and I couldn't get mine working either. Is this reasonable to do?Reparse into a RDD and group it there, then back to DF (too much hassle I think)
EDIT
Example input
id | myMap
'sample' | Map('a' -> 1, 'b' -> 2, 'c' -> 3)
Desired output
id | a | b | c
'sample' | 1 | 2 | 3