36
votes

I have two different files and I want to compare theirs contents line by line, and write their common contents in a different file. Note that both of them contain some blank spaces. Here is my pseudo code:

file1 = open('some_file_1.txt', 'r')
file2 = open('some_file_2.txt', 'r')
FO = open('some_output_file.txt', 'w')

for line1 in file1:
    for line2 in file2:
        if line1 == line2:
            FO.write("%s\n" %(line1))

FO.close()
file1.close()
file2.close()

However, by doing this, I got lots of blank spaces in my FO file. Seems like common blank spaces are also written. I want to write only the text part. Can somebody please help me.

For example: my first file (file1) contains data:

Config:
Hostname = TUVALU

BT:
TS_Ball_Update_Threshold = 0.2

BT:
TS_Player_Search_Radius = 4

BT:
Ball_Template_Update = 0

while second file (file2) contains data:

Pole_ID      = 2
Width        = 1280
Height       = 1024
Color_Mode   = 0
Sensor_Scale = 1

Tracking_ROI_Size = 4
Ball_Template_Update = 0

If you notice, last two lines of each files are the same, hence, I want to write this file in my FO file. But, the problem with my approach is that, it writes the common blank space also. Should I use regex for this problem? I do not have experience with regex.

7
When you say you want to compare them line by line, do you mean you want to check if the line is in both files or in the same location in both files? Can you post example input and output files? - thegrinner
line should be in both the files (common line). - Sanchit
Does it need to be in the same location? Like if test is in file 1 on line 3, will it match test in file 2 on line six? - thegrinner
@thegrinner I posted an example now. May be now its better. - Sanchit
@thegrinner location does not matter, as I said. It should be the same text, can be located anywhere in the files. - Sanchit

7 Answers

86
votes

This solution reads both files in one pass, excludes blank lines, and prints common lines regardless of their position in the file:

with open('some_file_1.txt', 'r') as file1:
    with open('some_file_2.txt', 'r') as file2:
        same = set(file1).intersection(file2)

same.discard('\n')

with open('some_output_file.txt', 'w') as file_out:
    for line in same:
        file_out.write(line)
13
votes

Yet another example...

from __future__ import print_function #Only for Python2

with open('file1.txt') as f1, open('file2.txt') as f2, open('outfile.txt', 'w') as outfile:
    for line1, line2 in zip(f1, f2):
        if line1 == line2:
            print(line1, end='', file=outfile)

And if you want to eliminate common blank lines, just change the if statement to:

if line1.strip() and line1 == line2:

.strip() removes all leading and trailing whitespace, so if that's all that's on a line, it will become an empty string "", which is considered false.

11
votes

If you are specifically looking for getting the difference between two files, then this might help:

with open('first_file', 'r') as file1:
    with open('second_file', 'r') as file2:
        difference = set(file1).difference(file2)

difference.discard('\n')

with open('diff.txt', 'w') as file_out:
    for line in difference:
        file_out.write(line)
7
votes

If order is preserved between files you might also prefer difflib. Although Robᵩ's result is the bona-fide standard for intersections you might actually be looking for a rough diff-like:

from difflib import Differ

with open('cfg1.txt') as f1, open('cfg2.txt') as f2:
    differ = Differ()

    for line in differ.compare(f1.readlines(), f2.readlines()):
        if line.startswith(" "):
            print(line[2:], end="")

That said, this has a different behaviour to what you asked for (order is important) even though in this instance the same output is produced.

4
votes

Once the file object is iterated, it is exausted.

>>> f = open('1.txt', 'w')
>>> f.write('1\n2\n3\n')
>>> f.close()
>>> f = open('1.txt', 'r')
>>> for line in f: print line
...
1

2

3

# exausted, another iteration does not produce anything.
>>> for line in f: print line
...
>>>

Use file.seek (or close/open the file) to rewind the file:

>>> f.seek(0)
>>> for line in f: print line
...
1

2

3
1
votes

Try this:

from __future__ import with_statement

filename1 = "G:\\test1.TXT"
filename2 = "G:\\test2.TXT"


with open(filename1) as f1:
   with open(filename2) as f2:
      file1list = f1.read().splitlines()
      file2list = f2.read().splitlines()
      list1length = len(file1list)
      list2length = len(file2list)
      if list1length == list2length:
          for index in range(len(file1list)):
              if file1list[index] == file2list[index]:
                  print file1list[index] + "==" + file2list[index]
              else:                  
                  print file1list[index] + "!=" + file2list[index]+" Not-Equel"
      else:
          print "difference inthe size of the file and number of lines"
0
votes

I have just been faced with the same challenge, but I thought "Why programming this in Python if you can solve it with a simple "grep"?, which led to the following Python code:

import subprocess
from subprocess import PIPE

try:
  output1, errors1 = subprocess.Popen(["c:\\cygwin\\bin\\grep", "-Fvf" ,"c:\\file1.txt", "c:\\file2.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
  output2, errors2 = subprocess.Popen(["c:\\cygwin\\bin\\grep", "-Fvf" ,"c:\\file2.txt", "c:\\file1.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
  if (len(output1) + len(output2) + len(errors1) + len(errors2) > 0):
    print ("Compare result : There are differences:");
    if (len(output1) + len(output2) > 0):
      print ("  Output differences : ");
      print (output1);
      print (output2);
    if (len(errors1) + len(errors2) > 0):
      print (" Errors : ");
      print (errors1);
      print (errors2);
  else:
    print ("Compare result : Both files are equal");
except Exception as ex:
  print("Compare result : Exception during comparison");
  print(ex);
  raise;

The trick behind this is the following: grep -Fvf file1.txt file2.txt verifies if all entries in file2.txt are present in file1.txt. By doing this in both directions we can see if the content of both files are "equal". I put "equal" between quotes because duplicate lines are disregarded in this way of working.

Obviously, this is just an example: you can replace grep by any commandline file comparison tool.