1
votes

In Spark how can I create a tuple from row as

(Col1 , Col2,Col3 ,(Col4+Col5+Col6))

I have 400+ dynamic generated column names . I don't wanted to do this aggregation at DB so select col1,col2,col3, (col4+col5+col6) is not the solution. I'm using cassandra for datastore

2

2 Answers

4
votes

In general, I think you have the right idea, so my suggestion here is just syntactic sugar:

df
 .map{row => (row(0), row(1), row(2), (3 until row.length).map(row.getLong(_)).sum)}
0
votes

Solved it using below code.. however I m still looking for shorter answer may be with syntatic sugar..

 df.map(x => {
        var sum :Long = 0
      for (i <- 3 until  x.length)
      sum = sum + x(i).asInstanceOf[Long]
        (x(0) ,x(1) ,x(2) ,sum)
      }).collect()