I am stuck on a particular scala-spark syntax, and I am hoping you can guide me in the correct direction.
if RDD1 is type Array[((Float, Float, Float), Long)],
RDD1.collect = Array((x1,y1,z1),1), ((x2,y2,z2),2), ((x3,y3,y3),3), ...)
and RDD2 is indices, of type, Array[Long],
RDD2.collect = Array(1, 3, 5...)
What is the best possible way to extract the values from RDD1 whose indices occur in RDD2. i.e, the output, Array((x1,y1,z1),1), ((x3,y3,y3),3),(x5,y5,y5),5) ...)
Both RDD1 and RDD2 are large enough that I would like to avoid using .collect. Otherwise, the problem is simply finding intersecting elements in 2 scala arrays/lists.
thank you so much for your help!