0
votes

I am trying to run a simple program that counts words in scala with spark. I have done all the installation by myself in linux, and I can not execute it as I am having this error:

java.lang.ClassNotFoundException: scala.Function0
at sbt.internal.inc.classpath.ClasspathFilter.loadClass(ClassLoaders.scala:74)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at com.twitter.chill.KryoBase$$anonfun$1.apply(KryoBase.scala:41)
at com.twitter.chill.KryoBase$$anonfun$1.apply(KryoBase.scala:41)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.twitter.chill.KryoBase.<init>(KryoBase.scala:41)
at com.twitter.chill.EmptyScalaKryoInstantiator.newKryo(ScalaKryoInstantiator.scala:57)
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:96)
at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292)
at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:277)
at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186)
at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects$lzycompute(KryoSerializer.scala:193)
at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects(KryoSerializer.scala:189)
at org.apache.spark.shuffle.sort.SortShuffleManager$.canUseSerializedShuffle(SortShuffleManager.scala:187)
at org.apache.spark.shuffle.sort.SortShuffleManager.registerShuffle(SortShuffleManager.scala:99)
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:90)
at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87)
at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.dependencies(RDD.scala:237)
at org.apache.spark.scheduler.DAGScheduler.getShuffleDependencies(DAGScheduler.scala:431)
at org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:380)
at org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:367)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:850)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1677)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

Basically the code I am trying to execute is:

import org.apache.spark.sql.SparkSession

object SparkWordCount extends App {

  val spark = SparkSession.builder
    .master("local[*]")
    .appName("Spark Word Count")
    .getOrCreate()

  val lines = spark.sparkContext.parallelize(
    Seq("Spark Intellij Idea Scala test one",
      "Spark Intellij Idea Scala test two",
      "Spark Intellij Idea Scala test three"))

  val counts = lines.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)

  counts.foreach(println)
}

I think that it should be due to something with the spark and scala versions, but I can not find the good solution.

My build.sbt looks like this:

name := "com.example.test"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.2.0",
  "org.apache.spark" %% "spark-sql" % "2.2.0"
)

If I execute the same code in an spark-shell it works, I have installed the versions:

  • scala: 2.11.8
  • spark: 2.2.0
  • sbt: 1.0.2

I have tried also with different scala versions, and spark, and everything, but does not work, could anyone help me?

Thank you in advance, Javi.

  • When I execute the command sbt inside the project, It shows:

    Getting Scala 2.12.3 (for sbt)...

What I can not understand, as I am specifying the scala version in the build.sbt (scalaVersion := "2.11.8")

1

1 Answers

1
votes

I would assume that you are trying to run this code from Intellij Idea. If that's true main issue lies in Idea configuration.

Go to File > Project Structures > Global Libraries and check whether you have scala-library installed. Idea uses its own sources for libraries and SDK that's why everything works fine in shell.

I tried your code in pure sbt layout and it worked with exactly the same build.sbt and source files.

One little thing. Maybe Idea automatically stops all running spark sessions at the end of the run but I believe you have to explicitly stop active session. Just put spark.stop() at the end of your executable function or object.