1
votes

I finally have Eclipse set up to be able to use Spark in a worksheet. I have the Scala 2.10.5 library in my build path & also included this jar: spark-assembly-1.4.1-hadoop2.6.0.jar

I can do most things on RDDs except map and flatMap. For example, given this data (sampleData.txt):

0,1 0 0
0,2 0 0
1,0 1 0
1,0 2 0
2,0 0 1
2,0 0 2

The following code gives a "macro has not been expanded" error.

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.rdd.RDD._

object sparkTestWS {
  val conf = new SparkConf().setMaster("local[*]").setAppName("My App")
  val sc = new SparkContext(conf)

  // start model section
  val data = sc.textFile("sampleData.txt")
  val dataM = data.map(x => x)
}

I looked up this macro error, and there's a post saying that it has to do with implicit Types and that it will be (or now is) fixed with Scala 2.11, but Spark is on Scala 2.10...

I also wondered if I might need to explicitly import the classes with these functions since there was a post that said that some implicit imports need to be made explicit, but so far I haven't been able to figure out what to import. I've tried scala.Array, scala.immutable., org.apache.spark.rdd., etc.

Any ideas? There are other posts stating that people are using Spark with Eclipse, so there must be a way to make Spark work in Eclipse (though the posts don't note whether or not they are using Scala worksheets.) I'm pretty new to Spark and only slightly less new to Scala, so any advice would be greatly appreciated. I really like Scala worksheets, so I'd like to get all this working if possible. Thx!

1

1 Answers

0
votes

Your code looks good to me.

Your problem is likely to be with the worksheets themselves. They are nice but being based on the REPL they are not exactly the same as compiled classes, they do a bunch of extra things in order to allow the code to flow (like redefining the same variable), each REPL command is wrapped on it own scope and this can mess with implicits, imports, etc. in subtle ways.

If you are new to both Scala and Spark, I would recommend using compiled classes for the time being and postpone Worksheets until you get a better grasp of the fundamentals.

That said, have you tried spark-shell?