I have two case classes and one RDD of each.
case class Thing1(Id: String, a: String, b: String, c: java.util.Date, d: Double)
case class Thing2(Id: String, e: java.util.Date, f: Double)
val rdd1 = // Loads an rdd of type RDD[Thing1]
val rdd2 = // Loads an rdd of type RDD[Thing2]
I want to create 2 new RDD[Thing1]s, 1 that contains elements of rdd1 where the element has an Id present in rdd2, and another that contains elements of rdd1 where the element does not have an Id present in rdd2
Here's what I have tried (looked at this, Scala Spark contains vs. does not contain, and other stack overflow posts, but none have worked)
val rdd2_ids = rdd2.map(r => r.Id)
val rdd1_present = rdd1.filter{case r => rdd2 contains r.Id}
val rdd1_absent = rdd1.filter{case r => !(rdd2 contains r.Id)}
But this gets me the error error: value contains is not a member of org.apache.spark.rdd.RDD[String]
I have seen many questions on SO asking how to do similar things to what I am trying to do, but none have worked for me. I get the value _____ is not a member of org.apache.spark.rdd.RDD[String]
error a lot.
Why are these other answers not working for me, and how can I achieve what I am trying to do?