0
votes

I have two tab-delimited tables:

table1

col1    col2    col3    col4
id1     1       1       10
id2     1       15      20
id3     1       30      35
id4     2       10      15


table2

col1    col2    col3
rs1     5       1
rs2     11      1
rs3     34      1
rs4     35      1

I first want to check if there is a match between value in col3-table2 and col2-table1. If this is TRUE then I want to check if there are values in col2-table2 which are between the values in col3 & col4 - table1. If this is the case, I want to print out the corresponding value(s) of col1 & col2 into a new column of table1.

So in this example, the final results file should look like this:

 table output
 col1    col2   col3   col4   new_col1    
 id1     1      1      10     rs1:5
 id2     1      15     20     
 id3     1      30     35     rs3:34, rs4:35    
 id4     2      10     15

After opening and loading the files, I started with storing the values of table2 in arrays of arrays.

my @table2;
    while (<$table2>){
        next if /^\s*#/; #to skip header lines
        my @columns = split;
        next if $columns[1] =~ /\D/;
        push @table2, \@columns;
        }

while (<$table1>){
     my @columns = split;
     ... 
}

How can I now check if there is a match between value in col3-table2 and col2-table1. And how then to proceed with checking if there are values in col2-table2 which are between the values in col3 & col4 - table1.

1
please read perldoc perldsc and perllolVorsprung
possible duplicate of find values in a range of 2 valuesSobrique

1 Answers

1
votes

Fortunately, I still have the code from last time in my notepad.

I've made a couple of updates, based on the changed requirement. This should do what you ask. (Feeding the table data in without inlining it is left as an exercise for the reader).

use strict;
use warnings;
use Data::Dumper;

my %table2;

while (<DATA>) {

    #stop reading if we've finished with table2
    last if m/^table1/;

    next unless m/^rs/;
    my ( $col1, $col2, $col3 ) = split(/\s+/);
    $table2{$col1}{$col3} = $col2;
}

print "The story so far...:\n";
print Dumper \%table2;


print "table output\n";
print join( "\t", qw ( col1    col2   col3   col4   new_col1 ) ), "\n";
while (<DATA>) {
    next unless m/^id/;
    chomp;
    my ( $rowid, $col2, $lower, $upper ) = split(/\s+/);
    my $newcol = "";
    foreach my $rs ( keys %table2 ) {
        if ( defined $table2{$rs}{$col2}
            and $table2{$rs}{$col2} >= $lower
            and $table2{$rs}{$col2} <= $upper )
        {
            $newcol .= " $rs:$table2{$rs}{$col2}";
        }
    }
    print join( "\t", $rowid, $col2, $lower, $upper, $newcol, ), "\n";
}


__DATA__
table2
col1    col2    col3
rs1     5       1
rs2     11      1
rs3     34      1
rs4     35      1

table1
col1    col2    col3    col4
id1     1       1       10
id2     1       15      20
id3     1       30      35
id4     2       10      15

This gives output of:

The story so far...:
$VAR1 = {
          'rs3' => {
                     '1' => '34'
                   },
          'rs4' => {
                     '1' => '35'
                   },
          'rs2' => {
                     '1' => '11'
                   },
          'rs1' => {
                     '1' => '5'
                   }
        };

table output
col1    col2    col3    col4     new_col1
id1     1       1       10       rs1:5
id2     1       15      20  
id3     1       30      35       rs3:34 rs4:35
id4     2       10      15