2
votes

I am trying to execute a simple Spark SQL code (PySpark) using Spark-Submit but received the below error. Note - I am running this in Spark 2.x.

spark-submit HousePriceSolution.py

Error:

from pyspark.sql import SparkSession ImportError: cannot import name SparkSession

Code:

 from pyspark.sql import SparkSession
 PRICE_SQ_FT = "Price SQ Ft"

 if __name__ == "__main__":

  session = SparkSession.builder.appName("HousePriceSolution").getOrCreate()    
  realEstate = session.read \
  .option("header","true") \
  .option("inferSchema", value=True) \
  .csv("hdfs:............./RealEstate.csv")

  realEstate.groupBy("Location") \
  .avg(PRICE_SQ_FT) \
  .orderBy("avg(Price SQ FT)") \
  .show()
  session.stop()
1

1 Answers

1
votes

Probably the spark-submit is pointing to another version of spark. Check what version of spark is used by spark-submit using the following command:

spark-submit --version

If the spark-version is ok, then check what the PYTHONPATH contains (echo $PYTHONPATH), because it is posible that PYTHONPATH has the pyspark library from another version of spark. If PYTHONPATH doesn't contain the pyspark library, then add to it like this:

export PYTHONPATH=$PYTHONPATH:"$SPARK_HOME/python/lib/*"