Multi level quotation marks wrap csv file pandas

Question

My .csv file looks like this:

col1, col2, col3, col4, col5, col6
"a, """"b, ""string1"""""", ""string2, string3"", """", c,"
"d, """"e, ""string4"""""", ""string5, string6"", """", f,"

I want to read this file in pandas. How to deal with those three challenges all in one read_csv command?

undo the rows from the single quotation marks " " wrap?
undo cells containing commas from the four quotation marks """" """"?
perserve commas treated as strings in the corresponding comma containing cells?

Why do you need to undo anything? I think that format just means that the column values contain literal quotation marks. — Barmar
read_csv(...) now returns a utf-8' codec can't decode error. So, I am still looking for the right way of putting the 'encoding' parameter. Any ideas? — iJup

nicholishen nicholishen · Accepted Answer · 2018-12-15T10:50:47

You can use str.replace and just sub the double quotes with an empty str.

>>> x = '"d, """"e, ""string4"""""", ""string5, string6""'
>>> x
'"d, """"e, ""string4"""""", ""string5, string6""'
>>> x.replace('"', '')
'd, e, string4, string5, string6'

To fix a csv file...

name = 'xxx.csv'

with open(name) as f:
    csv = f.read()
with open(name, 'w') as f:
    f.write(csv.replace('"', ''))

Multi level quotation marks wrap csv file pandas

1 Answers