I am trying to execute a simple Spark SQL code (PySpark) using Spark-Submit but received the below error. Note - I am running this in Spark 2.x.
spark-submit HousePriceSolution.py
Error:
from pyspark.sql import SparkSession ImportError: cannot import name SparkSession
Code:
from pyspark.sql import SparkSession
PRICE_SQ_FT = "Price SQ Ft"
if __name__ == "__main__":
session = SparkSession.builder.appName("HousePriceSolution").getOrCreate()
realEstate = session.read \
.option("header","true") \
.option("inferSchema", value=True) \
.csv("hdfs:............./RealEstate.csv")
realEstate.groupBy("Location") \
.avg(PRICE_SQ_FT) \
.orderBy("avg(Price SQ FT)") \
.show()
session.stop()