0
votes

I am trying to compare 2 CSV files, where first one only has couple of entries and the other one has 300 lines. The output I am trying to achive is to print the output of the line from the large file if column email from file1.csv matches the column email from file2.csv matches but the date doesn't. I would like to do this using the "CSV" module. Here is an example:

file1.csv (few entries):

Tom,Johns,[email protected],14 May 2017
Imaginary,User,[email protected],23 May 2018

file2.csv (large file):

First,User,[email protected],10 Jan 2018
Second,User,[email protected],13 Feb 2018
Tom,Johns,[email protected],16 Jun 2017
Imaginary,User,[email protected],23 May 2018

result.csv (Desired result)

Tom,Johns,[email protected],16 Jun 2017

I tried achieving this using the function "next" but due to only 2 entries in file1.csv the script stops. I have rewritten the code but now I am getting I/O operation on closed file.

import csv

with open('file1.csv', 'r') as first_csv:
    dialect = csv.excel()
    file1 = csv.reader(first_csv, dialect)

with open('file2.csv', 'r') as second_csv:
        dialect = csv.excel()
        file2 = csv.reader(second_csv, dialect)

writer = csv.writer(open('result.csv', 'w'))

output = set()

for row1 in file1:
        for row2 in file2:
                if (row1[2] == row2[2]) and (row1[3] != row1[3]):
                        writer.writerow(row2)
                        output.append(row2)
2
It may be a lot easier(and faster) to input each of the csv files into separate SQLite tables and then join the tables. More reading: docs.python.org/3/library/sqlite3.html, sqlite.org/docs.html. @StefanosGkouts - Mr.Zeus
@Mr.Zeus I'm pretty new in Python, so still getting my head around the basics. I would prefer to do it using "import csv", apologies for not mentioning this. I have updated the initial post. - Python_Rookie
In that case @Python_Rookie, I will get back to you with the pure Python & csv code, I am just a in the middle of something. - Mr.Zeus
I just realized it is because when you use the withstatement, thus closing the I/O stream when you reach the end of it. I fixed that but there is still an issue with your logic. - Mr.Zeus

2 Answers

0
votes

As @Mr.Zeus already said in the comments, the csv files need to stay open as long as you're reading or writing on them. Therefore, you can't use the with statement here, which closes the files on exit.

Another problem with your code was the (row1[3] != row1[3]) part, which is ofc always False :D

So here's a fixed version. Note the close in the finally block, which makes sure this will always happen, even if an Exception occurs in the try part.

import csv

first_csv = open('file1.csv', 'r')
second_csv = open('file2.csv', 'r')
output_csv = open('result.csv', 'w')

try:
    file1 = csv.reader(first_csv, csv.excel())
    file2 = csv.reader(second_csv, csv.excel())
    writer = csv.writer(output_csv)

    output = []
    for row1 in file1:
        for row2 in file2:
            if (row1[2] == row2[2]) and (row1[3] != row2[3]):
                writer.writerow(row2)
                output.append(row2)
finally:    
    first_csv.close()
    second_csv.close()
    output_csv.close()

print(output)
0
votes

I managed to fix the issue by converting the CSV to list. Here is the final code:

import csv

first_csv = open('file1.csv', 'r')
second_csv = open('file2.csv', 'r')
output_csv = open('result.csv', 'w')

file1 = csv.reader(first_csv, csv.excel())
file2 = csv.reader(second_csv, csv.excel())
writer = csv.writer(output_csv)

output = []
list1 = list(file1)
list2 = list(file2)

for row1 in list1:
    for row2 in list2:
        if (row1[2] == row2[2]) and (row1[3] != row2[3]):
            writer.writerow(row2)
            output.append(row2)

first_csv.close()
second_csv.close()
output_csv.close()

print(output)

Thanks for help everyone.