3
votes

I have a hash which should contain certain keys which are linked to their own arrays. To be more specific, the hash keys are quality values and the arrays are sequence names. If there already is an array for that quality, I'd want to add the sequence name to the array that is linked to the quality in question. If there isn't one, I want to create one and add the sequence name to it. All this is done in a while loop, going through all the sequences one by one.

I've tried to do things like in Perl How do I retrieve an array from a hash of arrays? but I can't seem to get it right.

I just get these error messages: Scalar value @{hash{$q} better written as ${hash{$q} at asdasd.pl line 69. Global symbol "@q" requires explicit package name asdasd.pl line 58. And some others, too.

Here is an example of what I've tried:

my %hash;
while (reading the sequences) {
    my $q = "the value the sequence has";
    my $seq = "the name of the sequence";

    if (exists $hash{$q}) {
        push (@{$hash{$q}}, $seq);
    } else {
        $hash{$q} = \@q;
        $hash{$q} = [$seq];
        next;
    }
}

This obviously shouldn't be a very complicated problem but I'm new to perl and this kind of a problem feels difficult. I've googled this from various places but there seems to be something I just don't realize, and it might be really obvious, too.

2
The array @q is not defined. Please include it with your code.shawnhcorey
Umm, yeah, it's not defined anywhere, that might cause problems. I'm still confused how you can (and sometimes need to) switch from ‰ or @ to $ in different situations, without having to define anything, sometimes I get errors and sometimes I'm meant to do that. Anyway, I got this to work with the tip that I only really need the "push" statement.user2500878
Please post working code. It is very difficult to answer your questions otherwise.shawnhcorey
The answer that I accepted was the solution. I don't have any more code related to this problem. :)user2500878
there's a fairly simple explanation of % vs @ vs $ at perldoc.perl.org/perldata.html#Variable-names . the code you had should have been fine, except for the unneeded line that used @q; not sure what that was trying to do.ysth

2 Answers

4
votes

You can use what perl calls autovivification to make this quite easy. Your code doesn't need that central if-statement. You can boil it down to:

    push @{ $hash{$q} }, $seq;

If the particular key doesn't yet exist in the hash, perl will autoviv it, since it can infer that you wanted an array reference here.

You can find further resources on autovivification by Googling it. It's a unique enough word that the vast majority of the hits seem relevant. :-)

2
votes

You are actually pretty close, a few notes though:

  1. In your else block you assign a reference to @q into your hash then immediately overwrite it with [$seq], only the last operation on the hash will hold

  2. You don't need next at the end of your loop, it will automatically go to the next iteration if there are no more statements to execute in the loop body.

Everything else seems to work fine, here are my revisions and the test data I used (since I don't know anything about DNA sequences I just used letters I remember from high school Biology)

Input file:

A 1
T 2
G 3 
A 3
A 2
G 5
C 1
C 1
C 2
T 4

Code:

use strict;
use warnings FATAL => 'all';

# open file for reading
open(my $fh, '<', 'test.txt');

my %hash;
while ( my $line = <$fh> ) { # read a line

    # split the line read from a file into a sequence name and value
    my ($q, $seq) = split(/\s+/, $line);

    if( exists $hash{$q} ) {
        push @{ $hash{$q} }, $seq;
    } 
    else {
        $hash{$q} = [$seq];
    }
}

# print the resulting hash
for my $k ( keys %hash ) {
   print "$k : ", join(', ', @{$hash{$k}}), "\n";
}


# prints
# A : 1, 3, 2
# T : 2, 4
# C : 1, 1, 2
# G : 3, 5