I cannot seem to find the correct Python3 CSV reader args to use to parse this particular CSV Dialect. The behavior of the thing generating the CSV is as following:
Parser information:
- Quotation character: " (\x22)
- Field Delimiter: ^ (\x5e)
- Record Separator: \n (\x0a)
- Escape Character \ (\x5c)
How the CSV which generated this format works:
- If the specified record separator is found in a field, quote field
- If the specified field separator is found in a field, quote the field.
- If the specified quotation character is found in a field, quote the field and escape the quotation character
- If the specified escape character is found in a field, do nothing...
^ this last point is what is causing me an issue since my first field of a particular row ends with a backslash. This causes the Python3 CSV parser to interpret the first field separator as being escaped.
See below:
(xcve) ttucker@plato:~/tmp/csv$ python --version
Python 3.6.4
(xcve) ttucker@plato:~/tmp/csv$ cat test_csv.py
import csv
with open('exotic_dialect.csv') as f:
data = f.readlines()
reader = csv.reader(data, delimiter='^', quotechar='"',
escapechar='\\', quoting=csv.QUOTE_MINIMAL)
for row in reader:
print(row)
(xcve) ttucker@plato:~/tmp/csv$ cat exotic_dialect.csv
a^b^c
a|^b^c
"a\""^b^c
"a^"^b^c
a\^b^c
(xcve) ttucker@plato:~/tmp/csv$ python test_csv.py
['a', 'b', 'c']
['a|', 'b', 'c']
['a"', 'b', 'c']
['a^', 'b', 'c']
['a^b', 'c']
^ This last list should have three fields; i.e., ['a\', 'b', 'c']
So, my questions are:
- Can this CSV Dialect be parsed by the default Python Lib (but with some specific options I can't seem to find)
- Can this be easily parsed by some python code (Also, assume that the first field ends in every printable ascii)
csvmodule as theescapecharis explicitly defined as escaping thedelimiter. It can also escape thequotecharifdoublequoteisFalse. So there is no way to just escape thequotechar. - AChampion