People have been asking what this question is about so i will try to sum it up - I am trying to achieve means of detecting best matching format for parsing CSV files. That is probably the best description of what i am trying to do.
I have csv file with these contents:
710000 8454889 03 3 ;sometext;;48,05;65,82;;65,82
710001 8454889 03 3 ;sometext;;49,09;66,96;;66,96
710002 8454889 03 3 ;sometext;;12,63;17,22;;17,22
No quotechars and ";" as delimiter.
I have described several csv reader dialects:
csv.register_dialect('excel', delimiter = ',', quotechar = '"', quoting = csv.QUOTE_ALL, strict = True, skipinitialspace = True)
csv.register_dialect('semicolonquotes', delimiter = ';', quotechar = '"', quoting = csv.QUOTE_ALL, strict = True, skipinitialspace = True)
csv.register_dialect('semicolonnonquotes', delimiter = ';', quotechar = None, quoting = csv.QUOTE_NONE, strict = True, skipinitialspace = True)
And i have script which tries to figure out which one of those formats matches the file contents the best. Unfortunately, in case of this example file, it matches for first case - "excel", even though i would like it to only match "semicolonnonquotes".
Edit: The code i use to match file is much like this:
dialects = csv.list_dialects()
for dialect in dialects:
file.seek(0)
reader = csv.reader(file, csv.get_dialect(dialect))
reader.next()
very simple code to see if reader throws error when reading with set dialect or not. Wrapped in try/except to catch first dialect without error. Unfortunately none of those dialects raises error.
/Edit
I figured that if i set strict (link) to True, then it would raise an error, when row contains no quotechars. But apparently it does not work like that.
First dialect matches, and gets me csv rows like:
['710000 8454889 03 3 ;sometext;;48', '05;65', '82;;65', '82']
Is there some way to tune this so i would get the results i desire:
['710000 8454889 03 3 ', 'sometext', '', '48,05', '65,82', '', '65,82']
Edit2
Reading through the docs it seems that specifying quoting for csv.reader does next to nothing:http://docs.python.org/2.7/library/csv.html#csv.QUOTE_ALL
Guess this is where my problems come from.
/Edit2
Disclaimer: I know that CSV stands for COMMA separated values. If there is no way to achieve what i want without extending existing library then i will accept that as an answer and force users to use CSV files which only contain commas as delimiters.