1
votes

I have a Python csv.reader() problem: I have a textfile, containing data that can basicly been read by csv.reader but I need one extra feature, that I am afraid is not included yet: instead of returning the value of the cell, I need it to return a pair (a,b) where the a is the cell content and b is True or False depending on if the reader did remove quotes from the cell.

Example: my file contains a line (not visible here: the cells are tabs delimited):

"123" 123 """123"""

it should return for it [(123,True), (123, False), ("123",True)]

I can't use quoting = csv.QUOTE_NONE because I have quotes and newlines in the cells.

Any Idea on how I do "overload" the reader funktion? Or is there anything else I can use?

1
Unfortunately, the csv module (at least in standard Python) is not a pure Python module but calls C code. You will have to roll your own, but as csv has many corner case, my advise is to get a source Python distrib and read how it is implemented. You could then make a specific module based on standard csv either in C + Python or plain Python. Alternatively, if you only need a subset of the full CSV specification, a custom dedicated module/function is probably the way to go - Serge Ballesta
@Serge Ballesta Thanks for the explaination! - user2567875
It would help if you edit the question to include the code you have so far. Could you give a link to a sample CSV file ? (using pastebin for example) - Martin Evans

1 Answers

0
votes

Assuming input.csv contains (containing tabs):

"123"   123 """123"""

You could just parse it without the CSV library as follows:

def remove_quotes(cell):
    no_quotes = cell.strip('"')
    length_diff = len(cell) - len(no_quotes)

    if length_diff == 0:
        try:
            return (int(cell), False)
        except ValueError:
            return (cell, False)
    elif length_diff == 2:
        try:
            return (int(no_quotes), True)
        except ValueError:
            return (cell, True)
    else:
        return (no_quotes, True)

with open('input.csv') as f_input: for line in f_input: line = [remove_quotes(cell) for cell in line.strip().split('\t')] print(line)

This would print:

[(123, True), (123, False), ('123', True)]            

The remove_quotes() function removes any quotes and detects any length changes and creates the required tuple accordingly.