I need to get the all the columns along with the count.In Scala RDD.
Col1 col2 col3 col4
us A Q1 10
us A Q3 10
us A Q2 20
us B Q4 10
us B Q5 20
uk A Q1 10
uk A Q3 10
uk A Q2 20
uk B Q4 10
uk B Q5 20
I want result like:
Col1 col2 col3 col4 count
us A Q1 10 3
us A Q3 10 3
us A Q3 10 3
us B Q4 10 2
us B Q5 20 2
uk A Q1 10 3
uk A Q3 10 3
uk A Q3 10 3
uk B Q4 10 2
uk B Q5 20 2
This is something like group by of col1, col2 and gets counts. Now I need along with col13,col4.
I am trying the SCALA RDD like:
val Top_RDD_1 = RDD.groupBy(f=> ( f._1,f._2 )).mapValues(_.toList)
This produces
RDD[((String, String), List[(String, String, String, Double, Double, Double)])]
Nothing but (col1,col2), List (col1,col2,col3,col14) result like (us,A) List((us,a,Q1,10),(us,a,Q3,10),(us,a,Q2,20)).,,,
How can I take the list count and access the list value.
Please help me spark SCALA RDD code.
Thanks Balaji.