0
votes

My .csv file looks like this:

col1, col2, col3, col4, col5, col6
"a, """"b, ""string1"""""", ""string2, string3"", """", c,"
"d, """"e, ""string4"""""", ""string5, string6"", """", f,"

I want to read this file in pandas. How to deal with those three challenges all in one read_csv command?

  • undo the rows from the single quotation marks " " wrap?
  • undo cells containing commas from the four quotation marks """" """"?
  • perserve commas treated as strings in the corresponding comma containing cells?
1
Why do you need to undo anything? I think that format just means that the column values contain literal quotation marks. - Barmar
Any decent CSV library should parse it correctly. - Barmar
read_csv(...) now returns a utf-8' codec can't decode error. So, I am still looking for the right way of putting the 'encoding' parameter. Any ideas? - iJup

1 Answers

0
votes

You can use str.replace and just sub the double quotes with an empty str.

>>> x = '"d, """"e, ""string4"""""", ""string5, string6""'
>>> x
'"d, """"e, ""string4"""""", ""string5, string6""'
>>> x.replace('"', '')
'd, e, string4, string5, string6'

To fix a csv file...

name = 'xxx.csv'

with open(name) as f:
    csv = f.read()
with open(name, 'w') as f:
    f.write(csv.replace('"', ''))