Your description is a little ambiguous - setting aside the check for "1,2,3" for a moment - your description talks of comparing columns 1 and 2 but column 1 has the same thing on every line in both files - i.e. "chr". As you've highlighted the numbers in columns 2 & 3 and as they appear in the "Output.txt" file, I presume you mean those two columns not 1 and 2 - that's the basis I'm proceeding on.
Before moving onto the solution, I just want to highlight a couple of problems with your existing code - firstly, you are string concatenating the two columns. What if columns 2 & 3 have "46" & "123" respectively in one file; and in the other its "461" & "23", then your concat is going to give you a false match. Now maybe, that just "ain't going to happen" and if you know your data that well, then fair enough - but you need to be aware of the possibility.
More importantly, the hash keeping track of the numbers previously seen is insufficient for the task you need of it - what happens if there are two lines with the same content in columns 2 & 3 in the same file? What happens if there are two lines the same in one file, and one line the same in the other file, giving a total of 3 but your only looking for a tally of 2?. Again, you may know that these combinations are not going to show up in your data but you need to be aware of the lurking bug.
One other thing - it's not clear (to me, at least) if the match of columns 2 & 3 have to be on the same line of each file respectively. In your test data, columns 2 & 3 on lines 4 & 5 are matching lines 4 & 5 respectively in the other file - is that necessary? Or, (again, setting aside the "1,2,3" thing for a minute) can columns 2 & 3 on line 4 of the first file happily match columns 2 & 3 of line 7 in the second?
I don't mean to be difficult here but obviously these things are very relevant to finding the right solution.
If you want the minimalist change to your existing code because none of these things I'm pointing out are going to matter, all you need to do is "bail out" of the first loop unless "1,2,3" is in column 5, that is $arr1[4]
or - after the split - $hit5
. Well, just add exactly that;
chomp;
my($hit1,$hit2,$hit3,$hit4,$hit5,$rest)=split(/\t/);
next unless $hit4 eq "1,2,3"; # <-- Added line
my $ckey="$hit1$hit2";
$chash{$ckey}=1;
'next' immediately terminates the current loop run, so $chash
will not get updated with the contents of columns 2 & 3 - but, I have to repeat, the end result is pretty precarious code.
Here is an alternative implementation:
#!/usr/bin/env perl
use v5.12;
my $file1 = $ARGV[0];
my $file2 = $ARGV[1];
open(FILE1, $file1) or die "$file1: $!\n";
open(FILE2, $file2) or die "$file2: $!\n";
open my $f, '>', "output.txt" or die "Cannot open output.txt: $!";
my @arr1 = map [split(" ", $_)], <FILE1>;
my @arr2 = map [split(" ", $_)], <FILE2>;
close FILE1;
close FILE2;
my $i = 0;
for my $arr1row (@arr1) {
# Grab the same row in file 2
my $arr2row = $arr2[$i++] ;
# bail unless we have "1,2,3" in col 5
next unless $arr1row->[4] eq "1,2,3" ;
# bail if we dont have a line from file 2 because its shorter
next unless defined $arr2row ;
# If col2 and col3 are the same from each file ...
if ($arr1row->[1] == $arr2row->[1] &&
$arr1row->[2] == $arr2row->[2] ) {
# print out all fields from file 2
say $f join("\t", @$arr2row);
}
}