1
votes

I am new to Pyspark and nothing seems to be working out. Please rescue. I want to read a parquet file with Pyspark. I wrote the following codes.

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

sqlContext.read.parquet("my_file.parquet")

I got the following error

Py4JJavaError Traceback (most recent call last) /usr/local/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e:

/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 318 "An error occurred while calling {0}{1}{2}.\n". --> 319 format(target_id, ".", name), value) 320 else:

then I tried the following codes

from pyspark.sql import SQLContext

sc = SparkContext.getOrCreate()

SQLContext.read.parquet("my_file.parquet")

Then the error was as follows :

AttributeError: 'property' object has no attribute 'parquet'

2
try this : SQLContext.read.format("parquet").load("my_file.parquet"). same error ?Steven
@ Steven ... tried but same error. I think the error is in creation of sql context.deega
@deega Could you upload this parquet file somewhere?Sai

2 Answers

3
votes

You need to create an instance of SQLContext first.

This will work from pyspark shell:

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)
sqlContext.read.parquet("my_file.parquet")

If you are using spark-submit you need to create the SparkContext in which case you would do this:

from pyspark import SparkContext
from pyspark.sql import SQLContext

sc = SparkContext()
sqlContext = SQLContext(sc)
sqlContext.read.parquet("my_file.parquet")
-1
votes
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
sc.stop()
conf = (conf.setMaster('local[*]'))
sc = SparkContext(conf = conf)
sqlContext = SQLContext(sc)

df = sqlContext.read.parquet("my_file.parquet")

Try this.