I want to create essentially a sumproduct across columns in a Spark DataFrame. I have a DataFrame that looks like this:
id val1 val2 val3 val4
123 10 5 7 5
I also have a Map that looks like:
val coefficents = Map("val1" -> 1, "val2" -> 2, "val3" -> 3, "val4" -> 4)
I want to take the value in each column of the DataFrame, multiply it by the corresponding value from the map, and return the result in a new column so essentially:
(10*1) + (5*2) + (7*3) + (5*4) = 61
I tried this:
val myDF1 = myDF.withColumn("mySum", {var a:Double = 0.0; for ((k,v) <- coefficients) a + (col(k).cast(DoubleType)*coefficients(k));a})
but got an error that the "+" method was overloaded. Even if I solved that, I'm not sure this would work. Any ideas? I could always dynamically build a SQL query as text string and do it that way but I was hoping for something a little more eloquent.
Any ideas are appreciated.