I followed the quick start tutorial.
My script is
from pyspark import SparkContext
logFile = 'README.md'
sc = SparkContext('local', 'Simple App')
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print 'Lines with a: %i, lines with b: %i' % (numAs, numBs)
I ran the script on the command line
$SPARK_HOME/bin/spark-submit --master local[2] SimpleApp.py
Traceback (most recent call last):
File "/home/huayu/Programs/Machine_learning/spark_exe/quick_start/SimpleApp.py", line 4, in sc = SparkContext('local', 'Simple App')
File "/home/huayu/Downloads/Software/spark/python/pyspark/context.py", line 115, in init conf, jsc, profiler_cls)
File "/home/huayu/Downloads/Software/spark/python/pyspark/context.py", line 174, in _do_init self._accumulatorServer = accumulators._start_update_server()
NameError: global name 'accumulators' is not defined
When I ran python SimpleApp.py
, it worked fine.
I got Spark from https://github.com/GUG11/spark (version 2.1.0) and I uses python 2.7.12.
There is another problem pertaining to Spark accumulator but the error information in my problem is different. pyspark ImportError: cannot import name accumulators