I finally have Eclipse set up to be able to use Spark in a worksheet. I have the Scala 2.10.5 library in my build path & also included this jar: spark-assembly-1.4.1-hadoop2.6.0.jar
I can do most things on RDDs except map and flatMap. For example, given this data (sampleData.txt):
0,1 0 0
0,2 0 0
1,0 1 0
1,0 2 0
2,0 0 1
2,0 0 2
The following code gives a "macro has not been expanded" error.
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.rdd.RDD._
object sparkTestWS {
val conf = new SparkConf().setMaster("local[*]").setAppName("My App")
val sc = new SparkContext(conf)
// start model section
val data = sc.textFile("sampleData.txt")
val dataM = data.map(x => x)
}
I looked up this macro error, and there's a post saying that it has to do with implicit Types and that it will be (or now is) fixed with Scala 2.11, but Spark is on Scala 2.10...
I also wondered if I might need to explicitly import the classes with these functions since there was a post that said that some implicit imports need to be made explicit, but so far I haven't been able to figure out what to import. I've tried scala.Array, scala.immutable., org.apache.spark.rdd., etc.
Any ideas? There are other posts stating that people are using Spark with Eclipse, so there must be a way to make Spark work in Eclipse (though the posts don't note whether or not they are using Scala worksheets.) I'm pretty new to Spark and only slightly less new to Scala, so any advice would be greatly appreciated. I really like Scala worksheets, so I'd like to get all this working if possible. Thx!