I have 4rdds of type RDD:((int,int,int),values) and my rdds are
rdd1: ((a,b,c), value)
rdd2:((a,d,e),valueA)
rdd3:((f,b,g),valueB)
rdd4:((h,i,c),valueC)
How can i join the rdds like rdd1 join rdd2 on "a" rdd1 join rdd2 on "b" and rdd1 join rdd3 on "c"
so the output is finalRdd: ((a,b,c),valueA,valueB,valueC,value))
in Scala ?
I tried doing this with collectAsMap but it didnt work well and throws exception
code just for rdd1 join rdd2
val newrdd2=rdd2.map{case( (a,b,c),d)=>(a,d)}.collectAsMap
val joined=rdd1.map{case( (a,b,c),d)=>(newrdd2.get(a).get,b,c,d)}
example
rdd1: ((1,2,3),animals)
rdd2:((1,anyInt,anyInt),cat)
rdd3:((anyInt,2,anyInt),cow )
rdd 4: ((anyInt,anyInt,3),parrot)
the output should be ((1,2,3),animals,cat,cow,parrot )
(1,2,3)
and value "animal" and "another-animal") – The Archetypal Paul