I have two RDDS :
rdd1 [String,String,String]: Name, Address, Zipcode
rdd2 [String,String,String]: Name, Address, Landmark
I am trying to join these 2 RDDs using the function : rdd1.join(rdd2)
But I am getting an error :error: value fullOuterJoin is not a member of org.apache.spark.rdd.RDD[String]
The join should join the RDD[String] and the output RDD should be something like :
rddOutput : Name,Address,Zipcode,Landmark
And I wanted to save these files as a JSON file in the end.
Can someone help me with the same ?
joinis defined on pair RDD, so your rdd1 is not of type RDD[(String, T)] . You should map it, like this rdd1.map(v => (v, 1)) (or to another tuple, it depends on your task). If you explain your goal in more details (what you expect to get from the join), you may get more help. - Vitalii KotliarenkoRDD[String], but twoRDD[String, String, String]. Which field(s) do you want to join on?NameandAddress, or just one of those? You need to change the RDDs to have entries that are tuples where the first of the pair is the key, and the rest is thevalue, then join will work. - The Archetypal Paul