I am programming with Pyspark in the Eclipse IDE and have been trying to transition to Spark 1.4.1 so that I may finally program using Python 3. The following program works in Spark 1.3.1 but throws an exception in Spark 1.4.1:
from pyspark import SparkContext, SparkConf
from pyspark.sql.types import *
from pyspark.sql import SQLContext
if __name__ == '__main__':
conf = SparkConf().setAppName("MyApp").setMaster("local")
global sc
sc = SparkContext(conf=conf)
global sqlc
sqlc = SQLContext(sc)
symbolsPath = 'SP500Industry.json'
symbolsRDD = sqlc.read.json(symbolsPath)
print "Done""
The traceback I'm getting is as follows:
Traceback (most recent call last):
File "/media/gavin/20A6-76BF/Current Projects Luna/PySpark Test/Test.py", line 21, in <module>
symbolsRDD = sqlc.read.json(symbolsPath) #rdd with all symbols (and their industries
File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 582, in read
return DataFrameReader(self)
File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/readwriter.py", line 39, in __init__
self._jreader = sqlContext._ssql_ctx.read()
File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 304, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o18.read. Trace:
py4j.Py4JException: Method read([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
at py4j.Gateway.invoke(Gateway.java:252)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)"
The external libraries I have for the project are ... spark-1.4.1-bin-hadoop2.6/python ... spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip ... spark-1.4.1-bin-hadoop2.6/python/lib/pyspark.zip (tried both including and not including this)
Can anybody help me out with what I'm doing wrong?