1
votes

I am trying to build Spark 1.2 with Maven. My goal is to use PySpark with YARN on Hadoop 2.2.

I saw that this was only possible by building Spark with Maven. First, is this true?

If it is true, what is the problem in the log below? How do I correct this?

C:\Spark\spark-1.2.0>mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests
clean package
Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Spark Project Parent POM
[INFO] Spark Project Networking
[INFO] Spark Project Shuffle Streaming Service
[INFO] Spark Project Core
[INFO] Spark Project Bagel
[INFO] Spark Project GraphX
[INFO] Spark Project Streaming
[INFO] Spark Project Catalyst
[INFO] Spark Project SQL
[INFO] Spark Project ML Library
[INFO] Spark Project Tools
[INFO] Spark Project Hive
[INFO] Spark Project REPL
[INFO] Spark Project YARN Parent POM
[INFO] Spark Project YARN Stable API
[INFO] Spark Project Assembly
[INFO] Spark Project External Twitter
[INFO] Spark Project External Flume Sink
[INFO] Spark Project External Flume
[INFO] Spark Project External MQTT
[INFO] Spark Project External ZeroMQ
[INFO] Spark Project External Kafka
[INFO] Spark Project Examples
[INFO] Spark Project YARN Shuffle Service
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Spark Project Parent POM 1.2.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ spark-parent ---
[INFO] Deleting C:\Spark\spark-1.2.0\target
[INFO]
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-versions) @ spark-parent
 ---
[INFO]
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-scala-sources) @ spark-
parent ---
[INFO] Source directory: C:\Spark\spark-1.2.0\src\main\scala added.
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-parent --
-
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-parent
 ---
[INFO] No sources to compile
[INFO]
[INFO] --- build-helper-maven-plugin:1.8:add-test-source (add-scala-test-sources
) @ spark-parent ---
[INFO] Test Source directory: C:\Spark\spark-1.2.0\src\test\scala added.
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:testCompile (scala-test-compile-first) @ spa
rk-parent ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-dependency-plugin:2.9:build-classpath (default) @ spark-parent
---
[INFO] Wrote classpath file 'C:\Spark\spark-1.2.0\target\spark-test-classpath.tx
t'.
[INFO]
[INFO] --- gmavenplus-plugin:1.2:execute (default) @ spark-parent ---
[INFO] Using Groovy 2.3.7 to perform execute.
[INFO]
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ spark-p
arent ---
[INFO]
[INFO] --- maven-shade-plugin:2.2:shade (default) @ spark-parent ---
[INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded jar.
[INFO] Replacing original artifact with shaded artifact.
[INFO]
[INFO] --- maven-source-plugin:2.2.1:jar-no-fork (create-source-jar) @ spark-par
ent ---
[INFO]
[INFO] --- scalastyle-maven-plugin:0.4.0:check (default) @ spark-parent ---
[WARNING] sourceDirectory is not specified or does not exist value=C:\Spark\spar
k-1.2.0\src\main\scala
Saving to outputFile=C:\Spark\spark-1.2.0\scalastyle-output.xml
Processed 0 file(s)
Found 0 errors
Found 0 warnings
Found 0 infos
Finished in 32 ms
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Spark Project Networking 1.2.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ spark-network-common_2
.10 ---
[INFO] Deleting C:\Spark\spark-1.2.0\network\common\target
[INFO]
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-versions) @ spark-networ
k-common_2.10 ---
[INFO]
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-scala-sources) @ spark-
network-common_2.10 ---
[INFO] Source directory: C:\Spark\spark-1.2.0\network\common\src\main\scala adde
d.
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-network-c
ommon_2.10 ---
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ spark-netw
ork-common_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\Spark\spark-1.2.0\network\common\s
rc\main\resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-networ
k-common_2.10 ---
[WARNING] Zinc server is not available at port 3030 - reverting to normal increm
ental compile
[INFO] Using incremental compilation
[INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null
)
[INFO] Compiling 42 Java sources to C:\Spark\spark-1.2.0\network\common\target\s
cala-2.10\classes...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [  5.267 s]
[INFO] Spark Project Networking ........................... FAILURE [  1.922 s]
[INFO] Spark Project Shuffle Streaming Service ............ SKIPPED
[INFO] Spark Project Core ................................. SKIPPED
[INFO] Spark Project Bagel ................................ SKIPPED
[INFO] Spark Project GraphX ............................... SKIPPED
[INFO] Spark Project Streaming ............................ SKIPPED
[INFO] Spark Project Catalyst ............................. SKIPPED
[INFO] Spark Project SQL .................................. SKIPPED
[INFO] Spark Project ML Library ........................... SKIPPED
[INFO] Spark Project Tools ................................ SKIPPED
[INFO] Spark Project Hive ................................. SKIPPED
[INFO] Spark Project REPL ................................. SKIPPED
[INFO] Spark Project YARN Parent POM ...................... SKIPPED
[INFO] Spark Project YARN Stable API ...................... SKIPPED
[INFO] Spark Project Assembly ............................. SKIPPED
[INFO] Spark Project External Twitter ..................... SKIPPED
[INFO] Spark Project External Flume Sink .................. SKIPPED
[INFO] Spark Project External Flume ....................... SKIPPED
[INFO] Spark Project External MQTT ........................ SKIPPED
[INFO] Spark Project External ZeroMQ ...................... SKIPPED
[INFO] Spark Project External Kafka ....................... SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Project YARN Shuffle Service ................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8.048 s
[INFO] Finished at: 2015-02-09T10:17:47+08:00
[INFO] Final Memory: 49M/331M
[INFO] ------------------------------------------------------------------------
[**ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compi
le (scala-compile-first) on project spark-network-common_2.10: wrap: java.io.IOE
xception: Cannot run program "javac": CreateProcess error=2, The system cannot f
ind the file specified -> [Help 1]**
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE
xception
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command

[ERROR]   mvn <goals> -rf :spark-network-common_2.10
3

3 Answers

1
votes

I had first installed JRE instead of JDK. My environment variables still referenced the JRE folder, and so it couldn't find the javac.exe binary.

1
votes

A 'quirk' with Spark builds is that it can download its own version of Maven if it determines it is required.

When you run ./build/mvn clean package you are not running Maven directly, you are running a Spark proprietary script. The first thing that script does is check if your mvn --version is new enough for the version that the project determines it needs (which is set in the pom.xml file).

This is an important point because if you're running an old version of maven, Spark may download an additional maven version and install it and use that instead.

Some key things:

  • When you run ./build/mvn clean package, check which version of maven it is using
  • When maven runs it does its own traversal to figure out which JAVA_HOME is used
  • Before trying to run the spark build, check JAVA_HOME is set as a variable
  • Check that the JAVA_HOME version is a full jdk, not just a jre
  • Update your Maven to the latest version (or check it is at least as new as the version set by in the pom.xml in the root directory

Thanks

0
votes

For this problem you need to set your java environment path correctly in .bashrc file. Then you need to build maven correct on set maven path for that check mvn -version.

Then it will build automaticaly without error.