0
votes

I have two RDDS(Spark scala) like Below :

rdd1 is Array((1,Array(1,2,3)),(2,Array(1,2,4)))

rdd2 is Array((1,Array(4,5,6)),(2,Array(3,5,6)))

first i have to produce Cartesian and summation of the array values of above two RDDs.

for example Cartesian is like below:

((11,(Array(1,2,3),Array(4,5,6))),(12,(Array(1,2,3),Array(3,5,6))),(21,(Array(1,2,4),Array(4,5,6))),(22,(Array(1,2,4),Array(3,5,6))))

Cartesian and summation like below:

Array((11,1*4+2*5+3*6) ,(12,1*3+2*5+3*6),(21,(1*4+2*5+4*6))(22,(1*3+2*5+4*6))

I have tried Cartesian like below :

scala> val cart=rdd1.cartesian(rdd2)

but i am getting result like:

Array[((Int, Array[Double]), (Int, Array[Double]))] i.e. 

(((1,(Array(1,2,3))),(1,Array(4,5,6))),
  ((1,(Array(1,2,3))),(2,Array(3,5,6))),
  ((2,(Array(1,2,4))),(1,Array(4,5,6))),
  ((2,(Array(1,2,4))),(2,Array(3,5,6)))
  )

please help me on this how to achieve

Array((11,1*4+2*5+3*6) ,(12,1*3+2*5+3*6),(21,(1*4+2*5+4*6))(22,(1*3+2*5+4*6))
1

1 Answers

1
votes

You simply need to map over the cartesian, gather the keys and calculate the inner product of the two arrays:

rdd1.cartesian(rdd2).map{ 
    case ((k1, v1), (k2, v2)) => (k1, k2) -> v1.zip(v2).map(x => x._1 * x._2).reduce(_ + _) 
}.collect

// res5: Array[((Int, Int), Int)] = Array(((1,1),32), ((1,2),31), ((2,1),38), ((2,2),37))