I am trying to sum columns in the following data frame in Spark/Scala, which was itself created through another data frame. I was using this answer as a guide: How to sum the values of one column of a dataframe in spark/scala
Here's my data, created from another aggregate function and assigned to a data frame:
+-------------+----+----+
|activityLabel| 1_3|4_12|
+-------------+----+----+
| 12|1075| 0|
| 1| 0|3072|
| 6|3072| 0|
| 3| 0|3072|
| 5|3072| 0|
| 9|3072| 0|
| 4|3072| 0|
| 8|3379| 0|
| 7|3072| 0|
| 10|3072| 0|
| 11|3072| 0|
| 2| 0|3072|
+-------------+----+----+
And here's my code to create the dataframe:
def createRangeActivityLabels(df: DataFrame): Unit = {
val activityRange: List[(Int, Int)] = List((1, 3), (4, 12))
val exprs: List[Column] = activityRange.map {
case (x, y) => {
val newLabel = s"${x}_${y}"
sum(when($"activityLabel".between(x, y), 0).otherwise(1)).alias(newLabel)
}
}
val df3: DataFrame = df.groupBy($"activityLabel").agg(exprs.head, exprs.tail: _*)
df3.show
And here's the code to get the sum. All I want to do is sum the columns labelled as 1_3 (exprs.head) and 4_12 (exprs(1))
val indexedLabel0: Int = df3.agg(sum(exprs.head)).first.getAs[Int](0)
}
I get the following error: org.apache.spark.sql.AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;;
I have tried multiple solutions to fix this but nothing seems to work. All ideas appreciated. Thanks!