I'm trying to use the csv module to parse a specifically formatted delimited file. I'm using Python 3.5.
The format is provided by a third party, and I am having issues using the csv module to give a representation in all cases. Any columns that are specified as a text data type have double quoted values. Dates and numbers will not have any quote value between pipes(the delimiter). The issue arises in that in trying multiple formats, I either get left with a single middle double quote, or I lose information like \ -> empty space. I'm hoping I don't have to use regular expressions for this, so if there is a way around it with the csv module, that'd be great.
Rules:
escape character is a "\"
- tab escape : \t
- new line character: \n
- backslash character: \\
- inner quote character: \"
- delimiter = |
- dates have no quotes.
- numbers, including NaN values(empty pipe ||) have no quotes
When I try various dialect parameters, I can't seem to correctly parse this csv file. It either converts backslashes to empty spaces, mis-places inner quotes, etc. Is there a possible way to use the csv module, or will I need to either do some post processing, or create my own regex?
import csv
import os
dialect_params = {'delimiter': '|'} # help needed here.
newline_sample = '"I went to dinner. \n Then I went to a show."'
quote_sample = '"I read the \"WSJ\", did you?"'
backslash_sample = '"Boasberg\\Wheeler Communications, Inc."'
na_sample = 'N\A'
date_sample = '2013-04-23'
number_sample = '1.3'
text_sample = '|'.join([newline_sample, quote_sample,
backslash_sample, na_sample,
date_sample, number_sample]) + '\n'
csv.reader(iter([text_sample]), **dialect_params)
linesdefined. - strubblybackspacedo you mean backslash? I'm surprised you have any trouble with that: the csv format does not special case backslash at all, by default. - strubblyquote_sampledo you really want literal\"or just a quote (Python will give you the latter)? - strubbly