How to ignore double quotes in Spark Dataframe where we read the input data from CSV?

Question

I would like to create a Spark dataframe (without double quotes) by reading input from csv file as mentioned below.

Here is my code, but no use so far.

val empDF = spark.read.format("com.databricks.spark.csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .option("quote", "\"")
  .option("escape", "\"")
  .load("EmpWithQuotes.csv")
  .toDF()

My expected output is not to add double quotes to out but I am getting an output with junk.

+---+-----+----------+----+
|eno|ename|      eloc|esal|
+---+-----+----------+----+
| 11|�abx�| �chennai�|1000|
| 22|�abr�|     �hyd�|3000|

Sparker0i Sparker0i · Accepted Answer · 2020-05-06T14:08:09

I tried this with Spark over Scala and it removed the quotes from the columns:

df = df.withColumn("ename", regexp_replace(col("ename"), "“", ""))
    .withColumn("eloc", regexp_replace(col("eloc"), "“", ""))
    .withColumn("ename", regexp_replace(col("ename"), "”", ""))
    .withColumn("eloc", regexp_replace(col("eloc"), "”", ""))

There must be something similar in the Python API of Spark too....

How to ignore double quotes in Spark Dataframe where we read the input data from CSV?

3 Answers