39
votes

I was trying to run spark-submit and I get "Failed to find Spark assembly JAR. You need to build Spark before running this program." When I try to run spark-shell I get the same error. What I have to do in this situation.

9
Need more info. How package your project? Command line which launch spark-submit.. - gasparms
I package it through command: mvn package - Silver Jay

9 Answers

57
votes

On Windows, I found that if it is installed in a directory that has a space in the path (C:\Program Files\Spark) the installation will fail. Move it to the root or another directory with no spaces.

30
votes

Your Spark package doesn't include compiled Spark code. That's why you got the error message from these scripts spark-submit and spark-shell.

You have to download one of pre-built version in "Choose a package type" section from the Spark download page.

12
votes

Try running mvn -DskipTests clean package first to build Spark.

4
votes

If your spark binaries are in a folder where the name of the folder has spaces (for example, "Program Files (x86)"), it didn't work. I changed it to "Program_Files", then the spark_shell command works in cmd.

3
votes

In my case, I install spark by pip3 install pyspark on macOS system, and the error caused by incorrect SPARK_HOME variable. It works when I run command like below:

PYSPARK_PYTHON=python3 SPARK_HOME=/usr/local/lib/python3.7/site-packages/pyspark python3 wordcount.py a.txt
3
votes
  1. Go to SPARK_HOME. Note that your SPARK_HOME variable should not include /bin at the end. Mention it when you're when you're adding it to path like this: export PATH=$SPARK_HOME/bin:$PATH

  2. Run export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" to allot more memory to maven.

  3. Run ./build/mvn -DskipTests clean package and be patient. It took my system 1 hour and 17 minutes to finish this.

  4. Run ./dev/make-distribution.sh --name custom-spark --pip. This is just for python/pyspark. You can add more flags for Hive, Kubernetes, etc.

Running pyspark or spark-shell will now start pyspark and spark respectively.

2
votes

Just to add to @jurban1997 answer.

If you are running windows then make sure that SPARK_HOME and SCALA_HOME environment variables are setup right. SPARK_HOME should be pointing to {SPARK_HOME}\bin\spark-shell.cmd

1
votes

Spark Installation:

For Window machine:

Download spark-2.1.1-bin-hadoop2.7.tgz from this site https://spark.apache.org/downloads.html

Unzip and Paste your spark folder in C:\ drive and set environment variable.

If you don’t have Hadoop,
you need to create Hadoop folder and also create Bin folder in it and then copy and paste winutils.exe file in it.

download winutils file from [https://codeload.github.com/gvreddy1210/64bit/zip/master][1] 

and paste winutils.exe file in Hadoop\bin folder and set environment variable for c:\hadoop\bin;

create temp\hive folder in C:\ drive and give the full permission to this folder like: 

C:\Windows\system32>C:\hadoop\bin\winutils.exe chmod 777 /tmp/hive

open command prompt first run C:\hadoop\bin> winutils.exe  and then navigate to C:\spark\bin>

run spark-shell

1
votes

If you have downloaded binary and getting this exception

enter image description here

Then please check your Spark_home path may contain spaces like "apache spark"/bin

Just remove spaces will works.