How to wait for grandchild process (`bash` retval becomes -1 in Perl due to SIG CHLD)

Question

I have a Perl script (snippet below) that runs in cron to perform system checks. I fork a child as a timeout and reap it with SIG{CHLD}. Perl does several system calls of Bash scripts and checks their exit status. One bash script fails about 5% of the time with no error. The Bash scripts exists with 0 and Perl sees $? as -1 and $! as "No child processes".

This bash script tests compiler licenses, and Intel icc is left around after the Bash script completes (ps output below). I think the icc zombie completes, forcing Perl into SIG{CHLD} handler, which blows away the $? status before I'm able to read it.

Compile status -1; No child processes

#!/usr/bin/perl
use strict;
use POSIX ':sys_wait_h';

my $GLOBAL_TIMEOUT = 1200;

### Timer to notify if this program hangs
my $timer_pid;
$SIG{CHLD} = sub {
    local ($!, $?);
    while((my $pid = waitpid(-1, WNOHANG)) > 0)
    {
        if($pid == $timer_pid)
        {
            die "Timeout\n";
        }
    }
};

die "Unable to fork\n" unless(defined($timer_pid = fork));
if($timer_pid == 0)  # child
{
    sleep($GLOBAL_TIMEOUT);
    exit;
}
### End Timer

### Compile test
my @compile = `./compile_test.sh 2>&1`;
my $status = $?;
print "Compile status $status; $!\n";
if($status != 0)
{
    print "@compile\n";
}

END  # Timer cleanup
{
    if($timer_pid != 0)
    {
        $SIG{CHLD} = 'IGNORE';
        kill(15, $timer_pid);
    }
}

exit(0);

#!/bin/sh

cc compile_test.c
if [ $? -ne 0 ]; then
    echo "Cray compiler failure"
    exit 1
fi

module swap PrgEnv-cray PrgEnv-intel
cc compile_test.c
if [ $? -ne 0 ]; then
    echo "Intel compiler failure"
    exit 1
fi

wait
ps
exit 0

The wait doesn't really wait because cc calls icc which creates a zombie grandchild process that wait (or wait PID) doesn't block for. (wait `pidof icc`, 31589 in this case, gives "not a child of this shell")

user 31589     1  0 12:47 pts/15   00:00:00 icc

I just don't know how to fix this in Bash or Perl.

Thanks, Chris

It looks like you are going to a lot of trouble to avoid using alarm. Is there a reason not to use alarm here? — mob
Your SIGCHLD handler is also reaping the shell spawned by the backticks, so the waitpid call done by the backticks fails (since the child has already been reaped). — ikegami
I have several bash calls in the real Perl script. Only this one fails periodically. Just noticed today the icc left behind, that "wait" can't catch. — Chris
"this one fails" -- I didn't get what fails? The fact that icc stays around (which is awkward), or is there an actual error? Note that "Compile status -1; No child processes" isn't an error since you have a CHLD handler and check $? after backticks, which may have gotten reaped by handler (so the only error is doing both). Also, from what you show it appears that cc starts icc and doesn't wait for it ...? (Are you sure? That sounds really strange to me.) — zdim
Note, you can't really check wait 31589 (or such) since you don't know what PID of a child is in the current run (it is most likely different from what it was in previous runs). — zdim

mob mob · Accepted Answer · 2019-06-05T20:29:08

Isn't this a use case for alarm? Toss out your SIGCHLD handler and say

local $? = -1;
eval {
    local $SIG{ALRM} = sub { die "Timeout\n" };
    alarm($GLOBAL_TIMEOUT);
    @compile = `./compile_test.sh 2>&1`;
    alarm(0);
};

my $status = $?;

instead.

How to wait for grandchild process (`bash` retval becomes -1 in Perl due to SIG CHLD)

3 Answers