0
votes

I have two RDDs with following Structure:

rdd1<String, String>: (str01, str12), (str01, str13), (str02, str13), ..  
rdd2<String, Float>: (str01, 0.1), (str02, 0.3), ..  

I want to join these RDDs to have a new RDD in which str01, str02 in the rdd1 are replaced by their values in rdd2, as follows:

rdd3<String, Float>: (str12, 0.1), (str13, 0.1), (str13, 0.3)  

Then I need to reduce this RDD by key as follows:

rdd4<String, Float>: (str12, 0.1), (str13, 0.1+0.3 = 0.4)  

I tried Left and right outer join but ended with RDD Any idea how to solve this?

1

1 Answers

0
votes

This helps your issue.

val map1=List("str01" -> "str12", "str01" -> "str13", "str02" -> "str13")
val map2=List("str01"->0.1, "str02"->0.3)

val rdd1=sc.parallelize(map1)
val rdd2=sc.parallelize(map2)

val joinedrdd = rdd1.join(rdd2).map(x=> x._2)
val r = joinedrdd.reduceByKey(_+_)

And this rdd r has structure as: RDD[(String, Double)]