I have been using pyspark [ with python 2.7] in an ipython notebook on Ubuntu 14.04 quite successfully by creating a special profile for spark and starting the notebook by calling $ipython notebook --profile spark. The mechanism for creating the spark profile is given on many websites but i have used the one given in here.
and the $HOME/.ipython/profile_spark/startup/00-pyspark-setup.py contains the following code
import os
import sys
# Configure the environment
if 'SPARK_HOME' not in os.environ:
os.environ['SPARK_HOME'] = '/home/osboxes/spark16'
# Create a variable for our root path
SPARK_HOME = os.environ['SPARK_HOME']
# Add the PySpark/py4j to the Python Path
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "build"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))
I have just created a new VM of Ubuntu 16.04 for my students where I want them to run pyspark programs in ipython notebook. Python, Pyspark is working quite well. We are using Spark 1.6.
However I have discovered that the current versions of ipython notebook [ or jupyter notebook ] whether downloaded through Anaconda or installed with sudo pip install ipython .. DO NOT SUPPORT the --profile option and all configuration parameters have to be specified in the ~/.jupyter/jupyter_notebook_config.py file.
Can someone please help me with the config parameters that I need to put into this file? Or is there an alternative solution? I have tried the findshark() explained here but could not make it work. Findspark got installed but findspark.init() failed, possibly because it was written for python 3.
My challenge is that everything is working just fine on my old installation of ipython on my machine but my students who are installing everything from scratch cannot get pyspark going on their VMs.