1
votes

The following Perl script generates an .xls file from a text file. It runs great in our linux test environment, but generates an empty spreadsheet (.xls) in our production environment when run via cron (cron works in test, as well.) Nothing jumps out at our sys admins in terms of system level settings that might account for this behavior. Towards the bottom of the script in the import_data subroutine, the correct number of lines is reported, but nothing is written to the spreadsheet and no errors are returned at either the script or system level. I ran it through the perl debugger but my skills fell short of being able to interactively watch it populate the file. The cron entry looks like this:

cd <script directory>; cvs2xls input.txt output.xls 2>&1

Any debugging tips would be appreciated, as well as potential system settings that I can forward on to our sysadmins.

#!/usr/bin/perl
use strict;
use warnings;

use lib '/apps/tu01688/perl5/lib/perl5';

use Spreadsheet::WriteExcel;
use Text::CSV::Simple;

BEGIN {
  unshift @INC, "/apps/tu01688/jobs/mayo-expert";
};

my $infile = shift;
usage()  unless defined $infile && -f $infile;
my $parser = Text::CSV::Simple->new;
my @data = $parser->read_file($infile);
my $headers = shift @data;

my $outfile = shift || $infile . ".xls";
my $subject = shift || 'worksheet';

sub usage {
    print "csv2xls infile [outfile] [subject]\n";
    exit;
}

my $workbook = Spreadsheet::WriteExcel->new($outfile);
my $bold = $workbook->add_format();
$bold->set_bold(1);
import_data($workbook, $subject, $headers, \@data);

# Add a worksheet
sub import_data {
    my $workbook  = shift;
    my $base_name = shift;
    my $colums    = shift;
    my $data      = shift;
    my $limit     = shift || 50_000;
    my $start_row = shift || 1;
    my $worksheet = $workbook->add_worksheet($base_name);
    $worksheet->add_write_handler(qr[\w], \&store_string_widths);
    #$worksheet->add_write_handler(qr[\w]| \&store_string_widths);
    my $w = 1;
    $worksheet->write('A' . $start_row, $colums, ,$bold);
    my $i = $start_row;
    my $qty = 0;
    for my $row (@$data) {
        $qty++;
        if ($i > $limit) {
             $i = $start_row;
             $w++;
             $worksheet = $workbook->add_worksheet("$base_name - $w");
             $worksheet->write('A1', $colums,$bold);
        }
        $worksheet->write($i++, 0, $row);
    }
    autofit_columns($worksheet);
    warn "Converted $qty rows.";
    return $worksheet;
}


###############################################################################
###############################################################################
#
# Functions used for Autofit.
#

###############################################################################
#
# Adjust the column widths to fit the longest string in the column.
#
sub autofit_columns {

    my $worksheet = shift;
    my $col       = 0;

    for my $width (@{$worksheet->{__col_widths}}) {

        $worksheet->set_column($col, $col, $width) if $width;
        $col++;
    }
}


###############################################################################
#
# The following function is a callback that was added via add_write_handler()
# above. It modifies the write() function so that it stores the maximum
# unwrapped width of a string in a column.
#
sub store_string_widths {

    my $worksheet = shift;
    my $col       = $_[1];
    my $token     = $_[2];

    # Ignore some tokens that we aren't interested in.
    return if not defined $token;       # Ignore undefs.
    return if $token eq '';             # Ignore blank cells.
    return if ref $token eq 'ARRAY';    # Ignore array refs.
    return if $token =~ /^=/;           # Ignore formula

    # Ignore numbers
    #return if $token =~ /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;

    # Ignore various internal and external hyperlinks. In a real scenario
    # you may wish to track the length of the optional strings used with
    # urls.
    return if $token =~ m{^[fh]tt?ps?://};
    return if $token =~ m{^mailto:};
    return if $token =~ m{^(?:in|ex)ternal:};


    # We store the string width as data in the Worksheet object. We use
    # a double underscore key name to avoid conflicts with future names.
    #
    my $old_width    = $worksheet->{__col_widths}->[$col];
    my $string_width = string_width($token);

    if (not defined $old_width or $string_width > $old_width) {
        # You may wish to set a minimum column width as follows.
        #return undef if $string_width < 10;

        $worksheet->{__col_widths}->[$col] = $string_width;
    }


    # Return control to write();
    return undef;
}


###############################################################################
#
# Very simple conversion between string length and string width for Arial 10.
# See below for a more sophisticated method.
#
sub string_width {

    return length $_[0];
}
2
Did you check the logs from the cron job? Any messages or errors? If you can't get any job output from cron, maybe change the end of the cron job from 2>&1 to >>/tmp/my-cron-job.log 2>&1 and then monitor that file.Stefan Becker
BTW: you should also be able to create a user cron job on your Linux test machine (crontab -e). Maybe you can already reproduce the issue there?Stefan Becker
Adding MAILTO=user@machine (with your email for instance) to the top of the crontab may recover some messages (that are emitted but now go to some forsaken account/log)zdim
Is cvs2xls a typo in your cron entry, or just in the question here?Jim Davis
I've learned the hard way (several times 😉): Always use absolute path names for everything in crontab. /path/to/perl /path/to/cvs2xls /path/to/input.txt /path/to/output.xls. Also don't assume any environment variables will be the same as at the command line, including PATH and PERL5LIB (if necessary, add -I/path/to/libs).haukex

2 Answers

2
votes

Uhmm.. don't put chained commands in cron, use an external script instead. Anyway: some suggestions that may help you:

Debugging cron commands

Check the mail! By default cron will mail any output from the command to the user it is running the command as. If there is no output there will be no mail. If you want cron to send mail to a different account then you can set the MAILTO environment variable in the crontab file e.g.

[email protected]
1 2 * * * /path/to/your/command

Capture the output yourself

1 2 * * *  /path/to/your/command &>/tmp/mycommand.log

which captures stdout and stderr to /tmp/mycommand.log

Look at the logs; cron logs its actions via syslog, which (depending on your setup) often go to /var/log/cron or /var/log/syslog.

If required you can filter the cron statements with e.g.

grep CRON /var/log/syslog 

Now that we've gone over the basics of cron, where the files are and how to use them let's look at some common problems.

Check that cron is running

If cron isn't running then your commands won't be scheduled ...

ps -ef | grep cron | grep -v grep

should get you something like

root    1224   1  0 Nov16 ?    00:00:03 cron

or

root    2018   1  0 Nov14 ?    00:00:06 crond

If not restart it

/sbin/service cron start

or

/sbin/service crond start

There may be other methods; use what your distro provides.

cron runs your command in a restricted environment.

What environment variables are available is likely to be very limited. Typically, you'll only get a few variables defined, such as $LOGNAME, $HOME, and $PATH.

Of particular note is the PATH is restricted to /bin:/usr/bin. The vast majority of "my cron script doesn't work" problems are caused by this restrictive path. If your command is in a different location you can solve this in a couple of ways:

  1. Provide the full path to your command.

    1 2 * * * /path/to/your/command
    
  2. Provide a suitable PATH in the crontab file

    PATH=/usr:/usr/bin:/path/to/something/else
    1 2 * * * command 
    

If your command requires other environment variables you can define them in the crontab file too.

cron runs your command with cwd == $HOME

Regardless of where the program you execute resides on the filesystem, the current working directory of the program when cron runs it will be the user's home directory. If you access files in your program, you'll need to take this into account if you use relative paths, or (preferably) just use fully-qualified paths everywhere, and save everyone a whole lot of confusion.

The last command in my crontab doesn't run

Cron generally requires that commands are terminated with a new line. Edit your crontab; go to the end of the line which contains the last command and insert a new line (press enter).

Check the crontab format

You can't use a user crontab formatted crontab for /etc/crontab or the fragments in /etc/cron.d and vice versa. A user formatted crontab does not include a username in the 6th position of a row, while a system formatted crontab includes the username and runs the command as that user.

I put a file in /etc/cron.{hourly,daily,weekly,monthly} and it doesn't run

  • Check that the filename doesn't have an extension see run-parts
  • Ensure the file has execute permissions.
  • Tell the system what to use when executing your script (eg. put #!/bin/sh at top)

Cron date related bugs

If your date is recently changed by a user or system update, timezone or other, then crontab will start behaving erratically and exhibit bizarre bugs, sometimes working, sometimes not. This is crontab's attempt to try to "do what you want" when the time changes out from underneath it. The "minute" field will become ineffective after the hour is changed. In this scenario, only asterisks would be accepted. Restart cron and try it again without connecting to the internet (so the date doesn't have a chance to reset to one of the time servers).

Percent signs, again

To emphasise the advice about percent signs, here's an example of what cron does with them:

# cron entry
* * * * * cat >$HOME/cron.out%foo%bar%baz

will create the ~/cron.out file containing the 3 lines

foo
bar
baz

This is particularly intrusive when using the date command. Be sure to escape the percent signs

* * * * * /path/to/command --day "$(date "+\%Y\%m\%d")"
0
votes

Thanks so much for the extensive feedback, everyone. I'm certainly taking a lot more away from this than I put into it. In any event, I ran across the answer. In my perl5 lib folder I found that somehow the IO and OLE libraries were missing on production. Copying those over from development resulted in everything working fine. The fact that I was unable to determine/capture this through conventional debugging efforts as opposed to merely comparing directory listings out of exasperation speaks to how much more I have to learn along these lines. But I'm confident that the great feedback I received will go a long ways towards getting me there. Thanks again, everyone.