0
votes

I am trying to join two paired RDDs, as per the answer provided here

Joining two RDD[String] -Spark Scala

I am getting an error

error: value leftOuterJoin is not a member of org.apache.spark.rdd.RDD[

The code snippet is as below.

val pairRDDTransactions = parsedTransaction.map 
     {
              case ( field3, field4, field5, field6, field7,
           field1, field2, udfChar1, udfChar2, udfChar3) => 
             ((field1, field2), field3, field4, field5, 
                 field6, field7, udfChar1, udfChar2, udfChar3)   
     }      



val pairRDDAccounts  = parsedAccounts.map
     {
       case (field8, field1, field2, field9, field10 ) =>
         ((field1, field2), field8, field9, field10)

     }  



val transactionAddrJoin = pairRDDTransactions.leftOuterJoin(pairRDDAccounts).map {       
       case ((field1, field2), (field3, field4, field5, field6,
           field7, udfChar1, udfChar2, udfChar3, field8, field9, field10)) =>
             (field1, field2, field3, field4, field5, field6,
           field7, udfChar1, udfChar2, udfChar3, field8, field9, field10)           

 }

In this case, field1 and field 2 are my keys, on which I want to perform join.

1
Here what you are missing, create tuple for your values as well. - Kaushal
val pairRDDTransactions = parsedTransaction.map { case ( field3, field4, field5, field6, field7, field1, field2, udfChar1, udfChar2, udfChar3) => ((field1, field2), (field3, field4, field5, field6, field7, udfChar1, udfChar2, udfChar3)) } - Kaushal
val pairRDDAccounts = parsedAccounts.map { case (field8, field1, field2, field9, field10 ) => ((field1, field2), (field8, field9, field10)) } - Kaushal

1 Answers

0
votes

Joins are defined for RDD[(K, V)] (RDD of Tuple2 objects. In you case however, there arbitrary tuples (Tuple4[_, _, _, _] and Tuple8[_, _, _, _, _, _, _, _]) - this just cannot work.

You should

... => 
  ((field1, field2), 
     (field3, field4, field5, field6, field7, udfChar1, udfChar2, udfChar3)   

and

... =>
  ((field1, field2), (field8, field9, field10))

respectively.