1
votes

I'm looking for the simplest recommendation to correct my Spark installation and set up so that I can properly run this in jupyter notebook:

from pyspark import SparkContext
sc = SparkContext()

In jupyter notebook, I get the following error related to a file not found error to a directory where I had a previous installation to spark-2.0.0-bin-hadoop2.7.

FileNotFoundError: [Errno 2] No such file or directory: '/Applications/spark-2.0.0-bin-hadoop2.7/./bin/spark-submit': '/Applications/spark-2.0.0-bin-hadoop2.7/./bin/spark-submit'

Do I need to add something to .bashrc or uninstall spark-2.0.0-bin-hadoop2.7 to make this work?

Originally I installed spark-2.0.0-bin-hadoop2.7 but had issues getting the right environmental variables $PATH to point to the root Applications folder. I'm new to setting variables and wasn't able to get through the entire set up correctly so I deleted those that I added in .bashrc and the spark folder in Applications.

Alternatively, I tried brew and pip to install apache-spark (2.4.3) and pyspark (2.4.3). For java, I downloaded directly from oracle and have version 8 RU211:

openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode)`

In the terminal, I'm able to run pyspark successfully, running spark 2.4.3 which uses python 2.7.10. However, I'm running python 3.7.3 when i use python3 -version:

Unknown option: -e
usage: /usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/Resources/Python.app/Contents/MacOS/Python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Try `python -h' for more information.
1

1 Answers

0
votes

I wrote a medium post about how to set up properly PySpark with Jupyter on a Mac environment -- https://medium.com/albert-franzi/install-pyspark-jupyter-spark-cdb15996dd52

I hope it could help you.