
I installed PySpark and Ipython notebook in ubuntu 12.04.

After installing when I run the "ipython --profile=pyspark", it is throwing the following exception

ubuntu_user@ubuntu_user-VirtualBox:~$ ipython --profile=pyspark  
Python 2.7.3 (default, Jun 22 2015, 19:33:41) 
Type "copyright", "credits" or "license" for more information.

IPython 0.12.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

IPython profile: pyspark
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
Exception                                 Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
    173             else:
    174                 filename = fname
--> 175             __builtin__.execfile(filename, *where)

/home/ubuntu_user/.config/ipython/profile_pyspark/startup/00-pyspark-setup.py in <module>()
      6 sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-'))
----> 8 execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

/home/ubuntu_user/spark/python/pyspark/shell.py in <module>()
     41     SparkContext.setSystemProperty("spark.executor.uri", os.environ["SPARK_EXECUTOR_URI"])
---> 43 sc = SparkContext(pyFiles=add_files)
     44 atexit.register(lambda: sc.stop())

/home/ubuntu_user/spark/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    108         """
    109         self._callsite = first_spark_call() or CallSite(None, None, None)
--> 110         SparkContext._ensure_initialized(self, gateway=gateway)
    111         try:
    112             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/home/ubuntu_user/spark/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
    232         with SparkContext._lock:
    233             if not SparkContext._gateway:
--> 234                 SparkContext._gateway = gateway or launch_gateway()
    235                 SparkContext._jvm = SparkContext._gateway.jvm

/home/ubuntu_user/spark/python/pyspark/java_gateway.pyc in launch_gateway()
     92                 callback_socket.close()
     93         if gateway_port is None:
---> 94             raise Exception("Java gateway process exited before sending the driver its port number")
     96         # In Windows, ensure the Java child processes do not linger after Python has exited.

Exception: Java gateway process exited before sending the driver its port number

Below is the settings and configuration file.

ubuntu_user@ubuntu_user-VirtualBox:~$ ls /home/ubuntu_user/spark
bin          ec2       licenses  README.md
CHANGES.txt  examples  NOTICE    RELEASE
conf         lib       python    sbin
data         LICENSE   R         spark-1.5.2-bin-hadoop2.6.tgz

Below is the IPython setting

ubuntu_user@ubuntu_user-VirtualBox:~$ ls .config/ipython/profile_pyspark/
db              ipython_config.py           log  security
history.sqlite  ipython_notebook_config.py  pid  startup

IPython and Spark(PySpark) Configuration

ubuntu_user@ubuntu_user-VirtualBox:~$ vi .config/ipython/profile_pyspark/ipython_notebook_config.py

# Configuration file for ipython-notebook.

c = get_config()

# IPython PySpark
c.NotebookApp.ip = 'localhost'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 7770

ubuntu_user@ubuntu_user-VirtualBox:~$ vi .config/ipython/profile_pyspark/startup/00-pyspark-setup.py
import os
import sys

spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-'))

execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

Setting the following environment variables in .bashrc or .bash_profile:

ubuntu_user@ubuntu_user-VirtualBox:~$ vi .bashrc 
export SPARK_HOME="/home/ubuntu_user/spark"
export PYSPARK_SUBMIT_ARGS="--master local[2]"

I am new for apache spark and IPython. How to solve this issue?


3 Answers


I had the same exception when my virtual machine doesn't have enough memory for Java. So I allocated more memory for my virtual machine and this exception goes away.

Steps: Shut down the VM -> VirtualBox Settings -> "System" tab -> Set the memory

(However, this may be only a workaround. I guess the correct way to fix this exception might be properly configuring Spark in terms of java memory.)


May be there is an error locating the pyspark shell by the spark.

export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH

This will work for Spark 1.6.1. If you have a different version try locating the .zip file and adding the path to the extract.


Two thoughts: Where is your JDK? I don't see a JAVA_HOME parameter configured in your file. That might be enough given:

Error: Must specify a primary resource (JAR or Python or R file)

Second, Make sure your port 7770 is open and available to your JVM.