0
votes

I am trying to do the Spark-RDD equivalent of this local Scala code:

val intList1=List(1,2,3,4,5,6)
val intList2=List(10,20,30,40,50,60)

val divisiblePairs=for(
    int1<-intList1;
    int2<-intList2
    if int2 % int1 == 0
) yield(int1,int2)

divisiblePairs.groupBy(_._1) //Map(6 -> List((6,30), (6,60)),...)

I tried:

val intRDD1=sc.parallelize(List(1,2,3,4,5,6))
val intRDD2=sc.parallelize(List(10,20,30,40,50,60))

val divisiblePairs=
for(
    int1<-intRDD1;
    int2<-intRDD2
    if int2 % int1 == 0
) yield(int1,int2)

which would need some extra work, but I am getting errors, even in the body of the for-comprehension:

error: type mismatch; found : org.apache.spark.rdd.RDD[(Int, Int)] required: TraversableOnce[?] int2<-intList2

1

1 Answers

5
votes

The error is to be expected because Spark doesn't support nesting. You can use cartesian method to do the same thing:

for {
  (int1, int2) <- intRDD1.cartesian(intRDD2)
  if int2 % int1 == 0
} yield(int1, int2)