how to access pyspark from jupyter notebook

Question

I have been using pyspark [ with python 2.7] in an ipython notebook on Ubuntu 14.04 quite successfully by creating a special profile for spark and starting the notebook by calling $ipython notebook --profile spark. The mechanism for creating the spark profile is given on many websites but i have used the one given in here.

and the $HOME/.ipython/profile_spark/startup/00-pyspark-setup.py contains the following code

import os
import sys
# Configure the environment
if 'SPARK_HOME' not in os.environ:
    os.environ['SPARK_HOME'] = '/home/osboxes/spark16'
# Create a variable for our root path
SPARK_HOME = os.environ['SPARK_HOME']
# Add the PySpark/py4j to the Python Path
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "build"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))

I have just created a new VM of Ubuntu 16.04 for my students where I want them to run pyspark programs in ipython notebook. Python, Pyspark is working quite well. We are using Spark 1.6.

However I have discovered that the current versions of ipython notebook [ or jupyter notebook ] whether downloaded through Anaconda or installed with sudo pip install ipython .. DO NOT SUPPORT the --profile option and all configuration parameters have to be specified in the ~/.jupyter/jupyter_notebook_config.py file.

Can someone please help me with the config parameters that I need to put into this file? Or is there an alternative solution? I have tried the findshark() explained here but could not make it work. Findspark got installed but findspark.init() failed, possibly because it was written for python 3.

My challenge is that everything is working just fine on my old installation of ipython on my machine but my students who are installing everything from scratch cannot get pyspark going on their VMs.

Sharop Sharop · Accepted Answer · 2016-07-16T20:27:12

i work with spark just for test purpose locally from ~/apps/spark-1.6.2-bin-hadoop2.6/bin/pyspark

PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook"   ~/apps/spark-1.6.2-bin-hadoop2.6/bin/pyspark

how to access pyspark from jupyter notebook

3 Answers