The system cannot find the path specified error while running pyspark

votes

I just downloaded spark-2.3.0-bin-hadoop2.7.tgz. After downloading I followed the steps mentioned here pyspark installation for windows 10.I used the comment bin\pyspark to run the spark & got error message

The system cannot find the path specified

Attached is the screen shot of error message

Attached is the screen shot of my spark bin folder

Screen shot of my path variable looks like

I have python 3.6 & Java "1.8.0_151" in my windows 10 system Can you suggest me how to resolve this issue?

apache-sparkpyspark

Please check this answer stackoverflow.com/a/52831841/2516356 – Moustafa Mahmoud

8 Answers

votes

My problem was that the JAVA_HOME was pointing to JRE folder instead of JDK. Make sure that you take care of that

votes

Actually, the problem was with the JAVA_HOME environment variable path. The JAVA_HOME path was set to .../jdk/bin previously,

I stripped the last /bin part for JAVA_HOME while keeping it (/jdk/bin) in system or environment path variable (%path%) did the trick.

votes

Worked hours and hours on this. My problem was with Java 10 installation. I uninstalled it and installed Java 8, and now Pyspark works.

votes

Switching SPARK_HOME to C:\spark\spark-2.3.0-bin-hadoop2.7 and changing PATH to include %SPARK_HOME%\bin did the trick for me.

Originally my SPARK_HOME was set to C:\spark\spark-2.3.0-bin-hadoop2.7\bin and PATH was referencing it as %SPARK_HOME%.

Running a spark command directly in my SPARK_HOME dir worked but only once. After that initial success I then noticed your same error and that echo %SPARK_HOME% was showing C:\spark\spark-2.3.0-bin-hadoop2.7\bin\.. I thought perhaps spark-shell2.cmd had edited it in attempts to get itself working, which led me here.

votes

Most likely you forgot to define the Windows environment variables such that the Spark bin directory is in your PATH environment variable.

Define the following environment variables using the usual methods for Windows.

First define an environment variable called SPARK_HOME to be C:\spark\spark-2.3.0-bin-hadoop2.7

Then either add %SPARK_HOME%\bin to your existing PATH environment variable, or if none exists (unlikely) define PATH to be %SPARK_HOME%\bin

If there is no typo specifying the PATH, echo %PATH% should give you the fully resolved path to the Spark bin directory i.e. it should look like

C:\spark\spark-2.3.0-bin-hadoop2.7\bin;

If PATH is correct, you should be able to type pyspark in any directory and it should run.

If this does not resolve the issue perhaps the issue is as specified in pyspark: The system cannot find the path specified in which case this question is a duplicate.

votes

Update: in my case it came down to wrong path for JAVA, I got it to work...

I'm having the same problem. I initially installed Spark through pip, and pyspark ran successfully. Then I started messing with Anaconda updates and it never worked again. Any help will be appreciated...

I'm assuming PATH is installed correctly for the original author. A way to check that is to run spark-class from command prompt. With correct PATH it will return Usage: spark-class <class> [<args>] when ran from an arbitrary location. The error from pyspark comes from a string of .cmd files that I traced to the last lines in spark-class2.cmd

This maybe silly, but altering the last block of code shown below changes the error message you get from pyspark from "The system cannot find the path specified" to "The syntax of the command is incorrect". Removing this whole block makes pyspark do nothing.

rem The launcher library prints the command to be executed in a single line suitable for being
rem executed by the batch interpreter. So read all the output of the launcher into a variable.
set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
"%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
%* > %LAUNCHER_OUTPUT%
for /f "tokens=*" %%i in (%LAUNCHER_OUTPUT%) do (
  set SPARK_CMD=%%i
)
del %LAUNCHER_OUTPUT%
%SPARK_CMD%

I removed "del %LAUNCHER_OUTPUT%" and saw that the text file generated remains empty. Turns out "%RUNNER%" failed to find correct directory with java.exe because I messed up the PATH to Java (not Spark).

votes

I know this is an old post, but I am adding my finding in case it helps anyone.

The issue is mainly due to the line source "${SPARK_HOME}"/bin/load-spark-env.sh in pyspark file. As you can see it's not expecting 'bin' in SPARK_HOME. All I had to do was remove 'bin' from my SPARK_HOME environment variable and it worked (C:\spark\spark-3.0.1-bin-hadoop2.7\bin to C:\spark\spark-3.0.1-bin-hadoop2.7\).

The error on Windows Command Prompt made it appear like it wasn't recognizing 'pyspark', while the real issue was with it not able to find the file 'load-spark-env.sh.'

votes

if you use anaconda for window. The below command can save your time

conda install -c conda-forge pyspark

After that restart anaconda and start "jupyter notebook"