I'm trying to read a CSV file that uses backslash to escape delimiters instead of using quotes. I've tried constructing the DataFrameReader without qoutes and with an escape character, but it doesn't work. It seems the "escape" option can only be used to escape quote characters. Is there any way around this other than crating a custom input format?
Here are the options that I'm using for now:
spark.read.options(Map(
"sep" -> ",",
"encoding" -> "utf-8",
"quote" -> "",
"escape" -> "\\",
"mode" -> "PERMISSIVE",
"nullValue" -> ""
For example let's say we have the following sample data:
Schema: Name, city
Joe Bloggs,Dublin\,Ireland
Joseph Smith,Salt Lake City\,\
Utah
That should return 2 records:
Name | City
-----------------|---------------
Joe Bloggs | Dublin,Ireland
Joseph Smith | Salt Lake City,
Utah
Being able to escape newlines would be a nice-to-have, but escaping the column delimiter is required. For now I'm thinking about reading the lines with spark.textFile, then using some CSV library to parse the individual lines. That will fix my escaped column delimiter problem, but not escaped row delimiters.