What I want to achieve
I'm working with Spark and Scala. I have two Pair RDDs.
rdd1 : RDD[(String, List[String])]
rdd2 : RDD[(String, List[String])]
Both RDDs are joined on their first value.
val joinedRdd = rdd1.join(rdd2)
So the resulting RDD is of type RDD[(String, (List[String], List[String]))]
. I want to map this RDD and extract the elements of both lists, so that the resulting RDD contains just these elements of the two lists.
Example
rdd1 (id, List(a, b))
rdd2 (id, List(d, e, f))
wantedResult (a, b, d, e, f)
Naive approach
My naive approach would be to adress each element directly with (i)
, like below:
val rdd = rdd1.join(rdd2)
.map({ case (id, lists) =>
(lists._1(0), lists._1(1), lists._2(0), lists._2(2), lists._2(3)) })
/* results in RDD[(String, String, String, String, String)] */
Is there a way to get the elements of each list, without adressing each individually? Something like "lists._1.extractAll
". Is there a way to use flatMap
to achieve what I'm trying to achieve?