2
votes

I'm new for spark, following tutorial to learn. I have installed openjdk version "1.8.0_121"(web-binary) Hadoop 2.8.0 (web-binary) scala version 2.11.8 (apt) and spark version 2.1.1 (web-binary-pre-build-with-hadoop 2.6.0 or later).

I runned SparkPi example and successed. But, some error appears, when I try to package my first spark app with sbt 0.13.15(apt), which was installed by the way org said.

I know must be a mistake about settings somewhere, but fail to find out in this link. Could anyone help me? Thanks :)

My project is like :

---SparkApp
  |---simple.sbt
  |---src
      |---main
          |---scala
              |--- SimpleApp.scala

The dot sbt file in my project is :

name := "Simple Project"

version := "0.13.15"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.1"

Error Log is like this :

hadoop@master:~/Mycode/SparkApp$ sbt package
[warn] Executing in batch mode.
[warn]   For better performance, hit [ENTER] to switch to interactive mode, or
[warn]   consider launching sbt without any commands, or explicitly passing 'shell'
[info] Loading project definition from /home/hadoop/Mycode/SparkApp/project
[info] Set current project to Simple Project (in build file:/home/hadoop/Mycode/SparkApp/)
[info] Compiling 1 Scala source to /home/hadoop/Mycode/SparkApp/target/scala-2.11/classes...
[error] missing or invalid dependency detected while loading class file 'SparkContext.class'.
[error] Could not access term akka in package <root>,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'SparkContext.class' was compiled against an incompatible version of <root>.
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 2 s, completed May 16, 2017 1:08:53 PM

Some hints might be the problem is :

  1. When I type spark-shell, I got this Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131) , which is different with when I type java -version openjdk version "1.8.0_121". Would this be the problem?
  2. I didn't do anything after install sbt. Should I do something for setting? like let sbt know where my scala and spark is located. How?
  3. I didn't have maven, should I?

------------------------ Second edit -------------------

After add -Ylog-classpath in dot sbt file, like this link said. I got a very long classpath print out which is too long to show here. Problem unsolved yet.

As noted, I provide the SimpleApp.scala :

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "file:///usr/local/spark/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}
2
@Rahul Should I try to use Intellij?yui_frank
You can try. intelliJ is the best IDE around when it comes to Spark & Scala development and plugins are readily available.Rahul
Also can you please add %provided a the end of your dependency in build.sbt and try? libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.1" % "provided"Rahul
@Rahul Thanks for your advice, my problem has been solved. I still used your solution in my code.yui_frank

2 Answers

1
votes

tl;dr If you want to develop Spark applications you don't have to install Spark.

Having Spark installed locally does help a lot in your early days as a Spark developer (with tools like spark-shell and spark-submit), but is not required yet highly recommended.

In other words, what you've installed as a Spark package has nothing to do with what you can and want to use while developing a Spark application.

In sbt-managed Scala projects, you define what you want to use as a dependency, including Spark dependency, in libraryDependencies setting as follows:

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.1"

And to my great surprise, you did that.

It appears that you use two different project directories to explain what you're doing ~/Mycode/SparkApp (in which you execute sbt package) and ---Pro (of which you show build.sbt).

Assuming your simple.sbt looks as follows:

name := "Simple Project"

version := "0.13.15"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.1"

I could find one issue only which is version setting that I believe is 0.13.15 to reflect the version of sbt.

Please note that they are not in any way related and version is the version of your application while the version of sbt to use in the project is defined in project/build.properties that (given the latest version of sbt 0.13.15) should be as follows:

sbt.version = 0.13.15

The issue you face while executing sbt package (in /home/hadoop/Mycode/SparkApp) is that your application defines dependency on Akka as you can see in the error message:

[info] Set current project to Simple Project (in build file:/home/hadoop/Mycode/SparkApp/)
[info] Compiling 1 Scala source to /home/hadoop/Mycode/SparkApp/target/scala-2.11/classes...
[error] missing or invalid dependency detected while loading class file 'SparkContext.class'.
[error] Could not access term akka in package <root>,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'SparkContext.class' was compiled against an incompatible version of <root>.
[error] one error found
[error] (compile:compileIncremental) Compilation failed

As of Spark 1.6 or so, Akka is no longer in use by Spark so I guess the project somehow reference Akka libraries that it should not if they're for Spark.

Lots of guesswork which I hope we'll sort out soon.

0
votes

Thanks for everyone's attention. My problem just solved by removing the /project and /target folder which was generated by one unsuccessful operation. The original pitfall caused this is still unknown. But this ending is enough for me. Thanks again. :)