The code below causes Spark to become unresponsive:
System.setProperty("hadoop.home.dir", "H:\\winutils");
val sparkConf = new SparkConf().setAppName("GroupBy Test").setMaster("local[1]")
val sc = new SparkContext(sparkConf)
def main(args: Array[String]) {
val text_file = sc.textFile("h:\\data\\details.txt")
val counts = text_file
.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
println(counts);
}
I'm setting hadoop.home.dir in order to avoid the error mentioned here: Failed to locate the winutils binary in the hadoop binary path
This is how my build.sbt file looks like:
lazy val root = (project in file(".")).
settings(
name := "hello",
version := "1.0",
scalaVersion := "2.11.0"
)
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "1.6.0"
)
Should Scala Spark be compilable/runnable using the sbt code in the file?
I think code is fine, it was taken verbatim from http://spark.apache.org/examples.html, but I am not sure if the Hadoop WinUtils path is required.
Update: "The solution was to use fork := true in the main build.sbt" Here is the reference: Spark: ClassNotFoundException when running hello world example in scala 2.11
ShuffledRDD[4] at reduceByKey at Application.scala:18
and shuts down. The only thing I see immediately is that there is no action used at the end of the transformations, i.e. the data never gets computed and returned to the driver, the code only generates an RDD with a few transformations. But that shouldn't cause the application to hang, especially not in single thread local mode (streaming needs at least 2 threads though, but youre not using that). – alextsc