0
votes

I'm trying to run a sample scala code using spark_submit using SBT. And this is my scala code -

import scala.math.random

import org.apache.spark._

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Spark Pi")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = 100000 * slices 
    val count = spark.parallelize(1 to n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x*x + y*y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }
}

And this is my sparksample.sbt file -

name := "Spark Sample"

version := "1.0"

scalaVersion := "2.9.1"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"

But when i run SBT and package command im getting the below error

[error] (*:update) sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.9.1;2.0.0: not found

My scala version is 2.9.1 and my spark version is 2.0.0.

I'm following the below site for running spark_submit using sbt -

https://www.supergloo.com/fieldnotes/apache-spark-cluster-part-2-deploy-a-scala-program-to-spark-cluster/

3

3 Answers

2
votes

It's important to know how to reason about this problem, which happens often in development on the JVM.

In the Scala ecosystem where binary compatibility is a concern, it is common for dependencies to have artifactIds named with the version of Scala they were compiled with. For example, the latest version of Spark has these coordinates:

groupId: 'org.apache.spark'
artifactId: 'spark-core_2.11'
version: '2.1.0'

The artifactId indicates that this dependency was compiled with Scala 2.11.

Meanwhile, SBT offers shorthand so you don't have to append the Scala version you've already specified to every dependency listed in libraryDependencies. It does so with the %% between the groupId and artifactId. (You can use a single % if you want to be explicit about versions, but that isn't common among SBT users.)

Put all this together, and you are implying in your build.sbt that your project has a dependency with coordinates org.apache.spark:spark-core_2.9.1:2.0.0. But the error says SBT can't find one.

When that happens, there are two possibilities. The dependency doesn't exist, or you need to add the repository where it does to build.sbt. With something available as widely as Spark--and you can confirm this in the documentation--you will know Spark is available from Maven Central. As the search for the artifact you specified shows, it doesn't exist.

So then it comes time to check the documentation to figure out which artifact you need instead. Or you can check Maven Central again or MVNRepository, which I generally prefer, to find the artifacts available for the combination of Scala version and Spark version you prefer to work with.

In the end, you will find Scala 2.11.x, which is not the latest version of Scala but the latest version of Scala that Spark works with, is what you want--probably 2.11.8. And if your environment permits, go with the latest version of Spark too, which is 2.1.0:

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.1.0", 
  "org.apache.spark" %% "spark-sql" % "2.1.0"
)
2
votes

unresolved dependency: org.apache.spark#spark-core_2.9.1;2.0.0: not found

There's no Scala 2.9.1-based dependency of Spark 2.0.0 and hence the error message.

sbt could not have been more correct. The blog post is super old and you'd better forget about it immediately (I really wished the blog post had not existed any more). Please use the Spark official documentation instead and your best bet would be to start from Quick Start.


Quick workaround is to replace scalaVersion := "2.9.1" in sparksample.sbt to:

scalaVersion := "2.11.8"

and you should be fine.

PROTIP Rename sparksample.sbt to build.sbt (and your teammates will love you again ;-))

0
votes

There's no 2.9.1 version of spark-core 2.0, see https://mvnrepository.com/artifact/org.apache.spark Scala 2.9.1 is pretty old and has many compatibility problems with 2.10 and upper versions. You should try at least Scala 2.10.