I have a dataframe as below ,I am trying to get the max(sum) for the user groupby name.
+-----+-----------------------------+
|name |nt_set |
+-----+-----------------------------+
|Bob |[av:27.0, bcd:29.0, abc:25.0]|
|Alice|[abc:95.0, bcd:55.0] |
|Bob |[abc:95.0, bcd:70.0] |
|Alice|[abc:125.0, bcd:90.0] |
+-----+-----------------------------+
Below is the udf I am using to get the max(sum) for the user
val maxfunc = udf((arr: Array[String]) => {
val step1 = arr.map(x => (x.split(":", -1)(0), x.split(":", -1)(1))).groupBy(_._1).mapValues(arr => arr.map(_._2.toInt).sum).maxBy(_._2)
val result = step1._1 + ":" + step1._2
result})
And when I run the udf,Its throwing the below error
val c6 = c5.withColumn("max_nt", maxfunc(col("nt_set"))).show(false)
Error: Failed to execute user defined function($anonfun$1: (array) =>string)
How do I achieve this in a better performed way because I need to do this in a larger dataset
The expected result is
expected result:
+-----+-----------------------------+
|name |max_nt |
+-----+-----------------------------+
|Bob |abc:120.0 |
|Alice|abc:220.0 |
+-----+-----------------------------+