0
votes

I’ve been struggling with converting *.xls files generated by an instrument to text format on a Linux server. I’m unable to process the files with Spreadsheet::ParseExcel, unless, I manually open them, sign off on the security warning and save them. Otherwise, they are not recognized as Excel (tested with the sample code).

!/usr/bin/perl -w

    use strict;
    use Spreadsheet::ParseExcel;

    my $parser   = Spreadsheet::ParseExcel->new();
    my $file = "/data/excel/matrix.xls";

    my $workbook = $parser->parse($file);

    if ( !defined $workbook ) {
        print "can't find workbook!!!";
        die $parser->error(), ".\n";
    }

    for my $worksheet ( $workbook->worksheets() ) {

        my ( $row_min, $row_max ) = $worksheet->row_range();
        my ( $col_min, $col_max ) = $worksheet->col_range();

        for my $row ( $row_min .. $row_max ) {
            for my $col ( $col_min .. $col_max ) {

                my $cell = $worksheet->get_cell( $row, $col );
                next unless $cell;

                print "Row, Col    = ($row, $col)\n";
                print "Value       = ", $cell->value(),       "\n";
                print "Unformatted = ", $cell->unformatted(), "\n";
                print "\n";
            }
        }
    }

I’ve tried changing the extension to *.prn and it lets me open the files manually without a warning but they are not recognized by the Spreadsheet::ParseExcel either.

The files contain 8 columns of data on the first sheet only. I would like to convert them to text files and used them to look up values in my Perl script. Here is some sample data in excel:

Gene   Target  Barcode1   Barcode2   Barcode3   Barcode4   Barcode5   Barcode6
MOTOR  MOTOR_1  343        453        432        345        543        342
MYCN   MYCN_2   342        98         87         876        54         765

My last option is to use VBA but I would reader stick to Perl/Shell code if possible. Is there a straightforward solution to this problem?

Thank you,

1

1 Answers

2
votes

It's not particularly elegant, but you might try using the Linux command "strings" to extract just the printable characters from your spreadsheet file first. Then you could parse the output until you see the column headings, and the data should be after that.