I am facing a disappointing issue while trying to use groupByKey or any function of a PairRDD or MappedRDD. What I get is that I have always just a RDD and I don't know how to convert it (really I am quite sure that the conversion should be automatically detected by Scala). My code is the following:
val broadcastedDistanceMeasure = sc.broadcast(dbScanSettings.distanceMeasure)
val distances = input.cartesian(input)
.filter(t => t._1!=t._2)
.map( {
case(p1, p2) => (p1) -> broadcastedDistanceMeasure.value.distance(p1,p2)
})
where input is a RDD. And the resulting type according to Eclise and sbt run is actually a RDD. So I cannot perform a groupByKey operation. If I try almost the same code on the spark shell, instead, I get a MappedRDD.
This is my build.sbt file:
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.1.0"
Can anybody help me?
Thanks.
Greetings.
Marco
groupByKeyoperation just because you are NOT importingorg.apache.spark.SparkContext._. - ale64bit