3
votes

I have installed spark the release: spark-2.2.0-bin-hadoop2.7.

I'm using Windows 10 OS

My java version 1.8.0_144

I have set my environment variables:

SPARK_HOME D:\spark-2.2.0-bin-hadoop2.7

HADOOP_HOME D:\Hadoop ( where I put bin\winutils.exe )

PYSPARK_DRIVER_PYTHON ipython

PYSPARK_DRIVER_PYTHON_OPTS notebook

Path is D:\spark-2.2.0-bin-hadoop2.7\bin

When I launch pyspark from command line I have this error:

ipython is not recognized as an internal or external command

I tried also to set PYSPARK_DRIVER_PYTHON in jupyter but and it's giving me the same error (not recognized as an internal or external command).

Any help please?

3
You sure have jupyter and ipython installed in your machine?desertnaut

3 Answers

6
votes

Search in your machine the ipython application, in my case it is in "c:\Anaconda3\Scripts". Then just add that path to the PATH Environment Variables

2
votes

On Windows 10 with Anaconda installed , please use Anaconda prompt rather that windows cmd and launch , jupyter notebook using below command

  pyspark --master local[2]

Please make sure all configurations as mentioned in question are done.

1
votes

On Windows 10 I solved by manually adding the path in Anaconda - Windows Powershell:

$env:Path += ";path\to\spark\bin"

other commands such as "setx" or other commands did not work for me.

EDIT: every time I start Anaconda prompt I need to run the command above again. As soon as I manage to make this solution "definitive" I'll edit my answer. Finally, I need to add Path to scala as well to make it work with Jupyter Notebook, with the following:

$env:Path += ";C:\Program Files (x86)\scala\bin"