2
votes

I have perl code running under mod_perl which connects to the openldap server slapd using the Net::LDAP module.

I am trying to set a connect timeout like this:

my $ldap = Net::LDAP->new($server, timeout => 120); 

but when slapd is heavily loaded I get connection attempts timing out after about 20 seconds.

Net::LDAP uses IO::Socket and IO::Select to implement its connection processing, in particular this code in IO::Socket (note that I've added a bit of extra debug code):

sub connect {
    @_ == 2 or croak 'usage: $sock->connect(NAME)';
    my $sock = shift;
    my $addr = shift;
    my $timeout = ${*$sock}{'io_socket_timeout'};
    my $err;
    my $blocking;

    my $start = scalar localtime;
    $blocking = $sock->blocking(0) if $timeout;
    if (!connect($sock, $addr)) {
    if (defined $timeout && ($!{EINPROGRESS} || $!{EWOULDBLOCK})) {
        require IO::Select;

        my $sel = new IO::Select $sock;

        undef $!;
        if (!$sel->can_write($timeout)) {
        $err = $! || (exists &Errno::ETIMEDOUT ? &Errno::ETIMEDOUT : 1);
        $@ = "connect: timeout";
        }
        elsif (!connect($sock,$addr) &&
                not ($!{EISCONN} || ($! == 10022 && $^O eq 'MSWin32'))
            ) {
        # Some systems refuse to re-connect() to
        # an already open socket and set errno to EISCONN.
        # Windows sets errno to WSAEINVAL (10022)
                my $now = scalar localtime;
        $err = $!;
        $@ = "connect: (1) $! : start = [$start], now = [$now], timeout = [$timeout] : " . Dumper(\%!);
        }
    }
        elsif ($blocking || !($!{EINPROGRESS} || $!{EWOULDBLOCK}))  {
        $err = $!;
        $@ = "connect: (2) $!";
    }
    }

    $sock->blocking(1) if $blocking;

    $! = $err if $err;

    $err ? undef : $sock;
}

and I'm seeing log output like this:

connect: (1) Connection timed out : start = [Tue Jun 19 14:57:44 2012], now = [Tue Jun 19 14:58:05 2012], timeout = [120] : $VAR1 = {
          'EBADR' => 0,
          'ENOMSG' => 0,
<snipped>
          'ESOCKTNOSUPPORT' => 0,
          'ETIMEDOUT' => 110,
          'ENXIO' => 0,
          'ETXTBSY' => 0,
          'ENODEV' => 0,
          'EMLINK' => 0,
          'ECHILD' => 0,
          'EHOSTUNREACH' => 0,
          'EREMCHG' => 0,
          'ENOTEMPTY' => 0
        };
 : Started attempt at Tue Jun 19 14:57:44 2012

Where is the 20 second connect timeout coming from?

EDIT: I've found the culprit now: /proc/sys/net/ipv4/tcp_syn_retries, which is set to 5 by default and 5 retries takes about 20 seconds. http://www.sekuda.com/overriding_the_default_linux_kernel_20_second_tcp_socket_connect_timeout

1
Is the specious timeout always about 20 seconds, or does that vary? What OS and revision is the client running on? - pilcrow
+1 BTW. Nicely troubleshot before coming here. - pilcrow
The timeout is always about 20 seconds (and I've seen it when using IO::Socket directly too). This is on Centos 6, 64 bit, IO::Socket version 1.31 is coming from perl-5.10.1-115.el6.x86_64. It may be possible to upgrade, but that decision wouldn't be down to me. - Chris Card
I just found this: sekuda.com/… - Chris Card
I updated my answer with a link to a deeper discussion of this behavior. - pilcrow

1 Answers

2
votes

Update: Some kernels are like that

The short answer is that some Linux kernels impose a 20 second timeout on connect()s. This is a bug.

Note that the linked sekuda is patently ambiguous: the default value of tcp_syn_retries (5) and the retry backoff would give a timeout much greater than 20 seconds. The missing nuance is given in the bug discussion linked above.

Original answer

Try upgrading.

The connect sub in IO::Socket version 1.34 (in perl 5.16, for instance), select()s sockets for writing and for errors. Error'd sockets are then inspected with getsockopt()/SO_ERROR for the true error condition.

I suspect you're getting a TCP 'soft error' (e.g., an ICMP host unreachable now and again). However, your version of IO::Socket misses the gist because it never looks at SO_ERROR.

If upgrading doesn't resolve the issue, then the right fix is to nail in logic inside IO::Socket::connect to do what the Linux connect(2) man page suggests, which is to inspect SO_ERROR after a nonblocking, connect()ing socket select()s writable.

Cheap Workaround

In the meantime, something like ...

# untested!
use Errno;

...

my $relative_to = 120; 
my $absolute_to = time() + $relative_to;

TRYCONN: {
  $ldap = Net::LDAP->new($server, timeout => $relative_to);
  if (! $ldap and $!{ETIMEDOUT}) {
    $rel_to = $absolute_to - time();
    redo TRYCONN if $relative_to > 0;
  }
}

die "Aaaaargh" unless $ldap;

... or similar should do the trick.