0
votes

I'm trying to do a chained comparison between two files and printing/writing out the result if it's in the specified interval.

This is what I have so far.

test1 file:

A0AUZ9,7,17 #just this one line

test 2 file:

A0AUZ8, DOC_PP1_RVXF_1, 8, 16, PF00149, O24930
A0AUZ9, LIG_BRCT_BRCA1_2, 127, 134, PF00533, O25336
A0AUZ9, LIG_BRCT_BRCA1_1, 127, 132, PF00533, O25336
A0AUZ9, DOC_PP1_RVXF_1, 8, 16, PF00149, O25685
A0AUZ9, DOC_PP1_RVXF_1, 8, 16, PF00149, O25155

And the script itself:

results = []

with open('test1', 'r') as disorder:
    for lines in disorder:
        cells = lines.strip().split(',')
        with open('test2', 'r') as helpy:
            for lines in helpy:
                blocks = lines.strip().split(',')
                if blocks[0] != cells[0]:
                    continue
                elif cells[1] <= blocks[2] and blocks[3] <= cells[2]:
                    results.append(blocks)                    

with open('test3','wt') as outfile:
    for i in results:
        outfile.write("%s\n" % i)

My preferred output would be to only have the rows in test3, that:

have matching ids in the first column

the two numerical values in columns 3 and 4 are between the values given in the test1 file

I get no output, and I'm not sure where it goes wrong.

1
Are the files sorted? - Burhan Khalid
Yes, by the names of the IDs in the first column (alphabetical order) - Márton Oelbei
You will have an issue with spaces here. strip() only removes leading and trailing spaces, not the ones inside a string. replace(" ", "") might help to that regard as long as spaces can be totally ignored. - Deneb
Oh my, removing the spaces actually solved the problem! Thank you very much! - Márton Oelbei

1 Answers

2
votes

One of the reasons that its not working as expected is you are comparing strings and not numbers.

However, there may be a better way to do what you are trying to do. Assuming that the first file is small enough to fit in memory:

import csv
from collections import defaultdict

lookup_table = defaultdict(list)

with open('test1.txt') as f:
   reader = csv.reader(f)
   for row in reader:
      lookup_table[row[0]].append((int(row[1]),int(row[2])))

with open('test2.txt') as a, open('results.txt', 'w') as b:
   reader = csv.reader(a)
   writer = csv.writer(b)

   for row in reader:
      record = lookup_table.get(row[0])
      if record:
         if record[0] <= int(row[2]) and record[1] <= int(row[3]):
             writer.writerow(row)