Compare two files report difference in python

Question

I have 2 files called "hosts" (in different directories)

I want to compare them using python to see if they are IDENTICAL. If they are not Identical, I want to print the difference on the screen.

So far I have tried this

hosts0 = open(dst1 + "/hosts","r") 
hosts1 = open(dst2 + "/hosts","r")

lines1 = hosts0.readlines()

for i,lines2 in enumerate(hosts1):
    if lines2 != lines1[i]:
        print "line ", i, " in hosts1 is different \n"
        print lines2
    else:
        print "same"

But when I run this, I get

File "./audit.py", line 34, in <module>
  if lines2 != lines1[i]:
IndexError: list index out of range

Which means one of the hosts has more lines than the other. Is there a better method to compare 2 files and report the difference?

How about calculating a hash? As a shortcut to quickly find out if they are diffrent — matcheek
@MrE I have already seen that one. It does not answer my question. I am a beginner in python and that question talks about hash and exiting as soon as it notices a difference. I don't want to exit. I want to print out all the difference. (thank you btw) — Matt
@user2799617 I will look into difflib but the diff command is a linux command. Python doesn't recognize it..!! — Matt

rbutcher rbutcher · Accepted Answer · 2013-10-02T00:14:21

import difflib

lines1 = '''
dog
cat
bird
buffalo
gophers
hound
horse
'''.strip().splitlines()

lines2 = '''
cat
dog
bird
buffalo
gopher
horse
mouse
'''.strip().splitlines()

# Changes:
# swapped positions of cat and dog
# changed gophers to gopher
# removed hound
# added mouse

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm=''):
    print line

Outputs the following:

--- file1
+++ file2
@@ -1,7 +1,7 @@
+cat
 dog
-cat
 bird
 buffalo
-gophers
-hound
+gopher
 horse
+mouse

This diff gives you context -- surrounding lines to help make it clear how the file is different. You can see "cat" here twice, because it was removed from below "dog" and added above it.

You can use n=0 to remove the context.

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
    print line

Outputting this:

--- file1
+++ file2
@@ -0,0 +1 @@
+cat
@@ -2 +2,0 @@
-cat
@@ -5,2 +5 @@
-gophers
-hound
+gopher
@@ -7,0 +7 @@
+mouse

But now it's full of the "@@" lines telling you the position in the file that has changed. Let's remove the extra lines to make it more readable.

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
    for prefix in ('---', '+++', '@@'):
        if line.startswith(prefix):
            break
    else:
        print line

Giving us this output:

+cat
-cat
-gophers
-hound
+gopher
+mouse

Now what do you want it to do? If you ignore all removed lines, then you won't see that "hound" was removed. If you're happy just showing the additions to the file, then you could do this:

diff = difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0)
lines = list(diff)[2:]
added = [line[1:] for line in lines if line[0] == '+']
removed = [line[1:] for line in lines if line[0] == '-']

print 'additions:'
for line in added:
    print line
print
print 'additions, ignoring position'
for line in added:
    if line not in removed:
        print line

Outputting:

additions:
cat
gopher
mouse

additions, ignoring position:
gopher
mouse

You can probably tell by now that there are various ways to "print the differences" of two files, so you will need to be very specific if you want more help.

Compare two files report difference in python

5 Answers