I am trying to join two paired RDDs, as per the answer provided here
Joining two RDD[String] -Spark Scala
I am getting an error
error: value leftOuterJoin is not a member of org.apache.spark.rdd.RDD[
The code snippet is as below.
val pairRDDTransactions = parsedTransaction.map
{
case ( field3, field4, field5, field6, field7,
field1, field2, udfChar1, udfChar2, udfChar3) =>
((field1, field2), field3, field4, field5,
field6, field7, udfChar1, udfChar2, udfChar3)
}
val pairRDDAccounts = parsedAccounts.map
{
case (field8, field1, field2, field9, field10 ) =>
((field1, field2), field8, field9, field10)
}
val transactionAddrJoin = pairRDDTransactions.leftOuterJoin(pairRDDAccounts).map {
case ((field1, field2), (field3, field4, field5, field6,
field7, udfChar1, udfChar2, udfChar3, field8, field9, field10)) =>
(field1, field2, field3, field4, field5, field6,
field7, udfChar1, udfChar2, udfChar3, field8, field9, field10)
}
In this case, field1 and field 2 are my keys, on which I want to perform join.
val pairRDDTransactions = parsedTransaction.map { case ( field3, field4, field5, field6, field7, field1, field2, udfChar1, udfChar2, udfChar3) => ((field1, field2), (field3, field4, field5, field6, field7, udfChar1, udfChar2, udfChar3)) }- Kaushalval pairRDDAccounts = parsedAccounts.map { case (field8, field1, field2, field9, field10 ) => ((field1, field2), (field8, field9, field10)) }- Kaushal