Problem
Maybe this is due to my lack of Scala knowledge, but it seems like adding another level to the for comprehension should just work. If the first for comprehension line is commented out, the code works. I ultimately want a Set[Int] instead of '1 to 2', but it serves to show the problem. The first two lines of the for should not need a type specifier, but I include it to show that I've tried the obvious.
Tools/Jars
- IntelliJ 2016.1
- Java 8
- Scala 2.10.5
- Cassandra 3.x
- spark-assembly-1.6.0-hadoop2.6.0.jar (pre-built)
- spark-cassandra-connector_2.10-1.6.0-M1-SNAPSHOT.jar (pre-built)
- spark-cassandra-connector-assembly-1.6.0-M1-SNAPSHOT.jar (I built)
Code
case class NotifHist(intnotifhistid:Int, eventhistids:Seq[Int], yosemiteid:String, initiatorname:String)
case class NotifHistSingle(intnotifhistid:Int, inteventhistid:Int, dataCenter:String, initiatorname:String)
object SparkCassandraConnectorJoins {
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeId:Int) {
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
...etc...
}
Compilation Output
Information:3/29/16 8:52 AM - Compilation completed with 1 error and 0 warnings in 1s 507ms
/home/jpowell/Projects/SparkCassandraConnector/src/com/mir3/spark/SparkCassandraConnectorJoins.scala
**Error:(88, 21) type mismatch;
found : scala.collection.immutable.IndexedSeq[Nothing]
required: org.apache.spark.rdd.RDD[com.mir3.spark.NotifHistSingle]
orgNodeId:Int <- 1 to 2
^**
Later
@slouc Thanks for the comprehensive answer. I was using the for comprehension's syntactic sugar to also keep state from the second statement to fill elements in the NotifHistSingle ctor, so I don't see how to get the equivalent map/flatmap to work. Therefore, I went with the following solution:
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeIds:Set[Int]) {
def notifHistForOrg(orgNodeId:Int): RDD[NotifHistSingle] = {
for {
notifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
}
val emptyTable:RDD[NotifHistSingle] = sc.emptyRDD[NotifHistSingle]
val notifHistForAllOrgs:RDD[NotifHistSingle] = orgNodeIds.foldLeft(emptyTable)((accum, oid) => accum ++ notifHistForOrg(oid))
}