Configure Ipython/Jupyter notebook with Pyspark on AWS EMR v4.0.0

Question

I am trying to use IPython notebook with Apache Spark 1.4.0. I have followed the 2 tutorial below to set my configuration

Installing Ipython notebook with pyspark 1.4 on AWS

and

Configuring IPython notebook support for Pyspark

After fisnish the configuration, following is several code in the related files:

1.ipython_notebook_config.py

c=get_config()
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser =False
c.NotebookApp.port = 8193

2.00-pyspark-setup.py

import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")

# Add the py4j to the path.
# You may need to change the version number to match your install

sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))
# Initialize PySpark to predefine the SparkContext variable 'sc'
execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

I also add following two lines to my .bash_profile:

export SPARK_HOME='home/hadoop/sparl'
source ~/.bash_profile

However, when I run

ipython notebook --profile=pyspark

it shows the message: unrecognized alias '--profile=pyspark' it will probably have no effect

It seems that the notebook doesn't configure with pyspark successfully Does anyone know how to solve it? Thank you very much

following are some software version

ipython/Jupyter: 4.0.0

spark 1.4.0

AWS EMR: 4.0.0

python: 2.7.9

By the way I have read the following, but it doesn't work IPython notebook won't read the configuration file

It sounds like the pyspark profile doesn't exist. Does the folder ~/.ipython/profile_pyspark exist? — santon
Hi thank you for your comment. The profile_pyspark exist. The weird thing is that when I add export SPARK_HOME='usr/lib/spark' export Ipython=1 export PYSPARK_PYTHON=/usr/bin/python2.7 export PYSPARK_DRIVER_PYTHON=ipython3 export PYSPARK_DRIVER_PYTHON_OPTS="notebook" to .bashrc and export PYSPARK_PYTHON=/usr/bin/python2.7 export PYSPARK_DRIVER_PYTHON=ipython3 \n \n to spark-env.sh Everything works! — kuan chen

ehdr ehdr · Accepted Answer · 2015-10-26T19:44:06

Jupyter notebooks don't have the concept of profiles (as IPython did). The recommended way of launching with a different configuration is e.g.:

JUPTYER_CONFIG_DIR=~/alternative_jupyter_config_dir jupyter notebook

See also issue jupyter/notebook#309, where you'll find a comment describing how to set up Jupyter notebook with PySpark without profiles or kernels.

Configure Ipython/Jupyter notebook with Pyspark on AWS EMR v4.0.0

5 Answers