I have two RDD of (key,values). My second RDD is shorter than my first RDD. I would like to associate each value of my first RDD to the corresponding value in the second RDD, with respect to the key.
val (rdd1: RDD[(key,A)])
val (rdd2: RDD[(key,B)])
val (rdd3: RDD[R])
with rdd1.count() >> rdd2.count(), and multiple elements of rdd1 have the same key.
Now, I know that I want to use a constant value for b when a corresponding key is not found in rdd2. I thought that leftOuterJoin would be the natural method to use here:
val rdd3 = rdd1.leftOuterJoin(rdd2).map{
case (key,(a,None)) => R(a,c)
case (key,(a,Some(b)) => R(a,b)
}
Anything that may strikes you as wrong here? I am getting unexpected results when joining elements like this.