0
votes

I have two CSV files that I'm trying to compare. I've read them using dict reader. So now I have dictionaries (one for each row) from two CSV files. I want to compare them, say when two elements (those with headers h1 and h2) are same, compare those dictionaries and print out the differences with respect to the second dictionary. Here are sample csv files.

csv1:

h1,h2,h3
aaa,g0,74
bjg,73,kg9

CSV_new:

h1,h2,h3,h4
aaa,g0,7,
bjg,73,kg9,ahf

I want the output to be something like this, though not exactly like shown below, I want it to be able to print out the modifications, additions and deletions in each dictionary with respect to CSV_new:

{h1:'aaa', h2:'g0' {h3:'74', h4:''}}
{h1:'bjg', h2:'73' {h4:''}

My code, that's not well-developed enough.

import csv
f1 = "csv1.csv"
reader1 = csv.DictReader(open (f1), delimiter = ",")
for row1 in reader1:
    row1['h1']
#['%s:%s' % (f, row[f]) for f in reader.fieldnames]
f2 = "CSV_new.csv"
reader2 = csv.DictReader(open (f2), delimiter = ",")
for row2 in reader2:
    row2['h1']
if row1['h1'] == row2['h1']:
    print row1, row2
2
I'm incredibly confused by your expected output. - Adam Smith
I'm sorry for my poor explanation. All I want is differences between both the files, with header names. @AdamSmith - abn
Essentially you're asking how to compare dictionaries -- how they were created is irrelevant -- so I suggest that you search for questions and answers on that topic. - martineau
@martineau Those don't really answer my question, or at least I'm not sure if those are what I'm looking for. because I have multiple dictionaries from one CSV file and they don't have any name/variable they're assigned with. - abn
Each row read from each of your csv.DictReader objects is a dictionary variable with a name (row1 or row2). If you stored them in lists, their variable names would become something like list_name[i] where i is an integer variable (or integer constant). - martineau

2 Answers

2
votes

If you just want to find difference you can use difflib As an example: import difflib fo1 = open(csv) fo2 = open(CSV_new) diff =difflib.ndiff(fo1.readlines(),fo2.readlines()) Then you can write the difference as you want

0
votes

This could be what you are looking for, but as mentioned above there is some ambiguity in your description.

with open(A) as fd1, open(B) as fd2:
    a, b = csv.reader(fd1), csv.reader(fd2)
    ha, hb = next(a), next(b)
    if not set(ha).issubset(set(hb)):
        sys.exit(1)

    lookup = {label : (key, hb.index(label)) for key, label in enumerate(ha)}
    for rowa, rowb in zip(a, b):
        for key in lookup:
            index_a, index_b = lookup[key]
            if rowa[index_a] != rowb[index_b]:
                 print(rowb)
                 break