0
votes

I am new the scala and SBT build files. From the introductory tutorials adding spark dependencies to a scala project should be straight-forward via the sbt-spark-package plugin but I am getting the following error:

[error] (run-main-0) java.lang.NoClassDefFoundError: org/apache/spark/SparkContext

Please provide resources to learn more about what could be driving error as I want to understand process more thoroughly.

CODE:

trait SparkSessionWrapper {

  lazy val spark: SparkSession = {
    SparkSession
      .builder()
      .master("local")
      .appName("spark citation graph")
      .getOrCreate()
  }

  val sc = spark.sparkContext

}


import org.apache.spark.graphx.GraphLoader

object Test extends SparkSessionWrapper {

  def main(args: Array[String]) {
    println("Testing, testing, testing, testing...")

    var filePath = "Desktop/citations.txt"
    val citeGraph = GraphLoader.edgeListFile(sc, filepath)
    println(citeGraph.vertices.take(1))
  }
}

plugins.sbt

resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"

addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.6")

build.sbt -- WORKING. Why does libraryDependencies run/work ?

spName := "yewno/citation_graph"

version := "0.1"

scalaVersion := "2.11.12"

sparkVersion := "2.2.0"

sparkComponents ++= Seq("core", "sql", "graphx")

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.2.0",
  "org.apache.spark" %% "spark-sql" % "2.2.0",
  "org.apache.spark" %% "spark-graphx" % "2.2.0"
)

build.sbt -- NOT WORKING. Would expect this to compile & run correctly

spName := "yewno/citation_graph"

version := "0.1"

scalaVersion := "2.11.12"

sparkVersion := "2.2.0"

sparkComponents ++= Seq("core", "sql", "graphx")

Bonus for explanation + links to resources to learn more about SBT build process, jar files, and anything else that can help me get up to speed!

1

1 Answers

1
votes

sbt-spark-package plugin provides dependencies in provided scope:

sparkComponentSet.map { component =>
  "org.apache.spark" %% s"spark-$component" % sparkVersion.value % "provided"
}.toSeq

We can confirm this by running show libraryDependencies from sbt:

[info] * org.scala-lang:scala-library:2.11.12
[info] * org.apache.spark:spark-core:2.2.0:provided
[info] * org.apache.spark:spark-sql:2.2.0:provided
[info] * org.apache.spark:spark-graphx:2.2.0:provided

provided scope means:

The dependency will be part of compilation and test, but excluded from the runtime.

Thus sbt run throws java.lang.NoClassDefFoundError: org/apache/spark/SparkContext

If we really want to include provided dependencies on run classpath then @douglaz suggests:

run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated