I am trying to read csv file using pyspark but its showing some error. Can you tell what is the correct process to read csv file?
python code:
from pyspark.sql import *
df = spark.read.csv("D:\Users\SPate233\Downloads\iMedical\query1.csv", inferSchema = True, header = True)
i tried also below one:
sqlContext = SQLContext
df = sqlContext.load(source="com.databricks.spark.csv", header="true", path = "D:\Users\SPate233\Downloads\iMedical\query1.csv")
error:
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
df = spark.read.csv("D:\Users\SPate233\Downloads\iMedical\query1.csv", inferSchema = True, header = True)
NameError: name 'spark' is not defined
and
Traceback (most recent call last):
File "<pyshell#26>", line 1, in <module>
df = sqlContext.load(source="com.databricks.spark.csv", header="true", path = "D:\Users\SPate233\Downloads\iMedical\query1.csv")
AttributeError: type object 'SQLContext' has no attribute 'load'