In pyspark, considering the two rdds like:
rrd1 = [('my name',5),('name is',4)]
and
rdd2 = [('my',6),('name',10),('is',5)]
where rdd1 is the tuples of bigrams and counts, rdd2 is the tuples of corresponding unigram and counts, I want to have an RDD of tuples of 3 elements like:
RDD = [ (('my name',5),('my',6),('name',10)) , (('name is',4), ('name',10),('is',5)) ]
I tried rdd2.union(rdd1).reduceByKey(lambda x,y : x+y) but in this case it is not the proper way because the keys are different but in some sense they are related.