4
votes

Python 2.7, Apache Spark 2.1.0, Ubuntu 14.04 In the pyspark shell I'm getting the following error:

>>> from pyspark.mllib.stat import Statistics
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named stat

Solution ?

similarly

>>> from pyspark.mllib.linalg import SparseVector
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named linalg

I have numpy installed and

 >>> sys.path
['', u'/tmp/spark-2d5ea25c-e2e7-490a-b5be-815e320cdee0/userFiles-2f177853-e261-46f9-97e5-01ac8b7c4987', '/usr/local/lib/python2.7/dist-packages/setuptools-18.1-py2.7.egg', '/usr/local/lib/python2.7/dist-packages/pyspark-2.1.0+hadoop2.7-py2.7.egg', '/usr/local/lib/python2.7/dist-packages/py4j-0.10.4-py2.7.egg', '/home/d066537/spark/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip', '/home/d066537/spark/spark-2.1.0-bin-hadoop2.7/python', '/home/d066537', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PILcompat', '/usr/lib/python2.7/dist-packages/gst-0.10', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/python2.7/dist-packages/ubuntu-sso-client']

2

2 Answers

1
votes

Remove pyspark installation.

sudo -H pip uninstall pyspark
0
votes

I have the same problem. The Python file stat.py does not seem to be in Spark 2.1.x but in Spark 2.2.x. So it seems that you need to upgrade Spark with its updated pyspark (but Zeppelin 0.7.x does not seem to work with Spark 2.2.x).