Perl Regex - extracting ipv4 from maillog

Question

I'm working on a distributed fail2ban like system in perl/mysql/iptables.

Extracting ipv4 addresses from /var/log/messages is working, but now I want to add /var/log/maillog to the soup.

I have a perl regex:^[1]

/ (?:25[012345]|2[0-4]\d|1?\d\d?)\.
  (?:25[012345]|2[0-4]\d|1?\d\d?)\.
  (?:25[012345]|2[0-4]\d|1?\d\d?)\.
  (?:25[012345]|2[0-4]\d|1?\d\d?) /x

And a line from maillog:

v817YjcU016645: 194.102.60.190.host.ifxnetworks.com [190.60.102.194] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA

Here the regex matches both 194.102.60.190.host.ifxnetworks.com and [190.60.102.194]

In my code I have ($IP is the above regex):

if ($line =~ m/($IP)/)
{
    my ($ip) = $1;

Here the first matching ip-like string is found 194.102.60.190.host.ifxnetworks.com

So, how do I get the regex to ignore an ipv4 that ends in a .

^[1] for readability Perl supports the /x option

I think dots after IP addresses are just a starting point to the problem — Wolf
I split your regex over lines for readability. Please check. If it's OK ... are all four indeed the same? — zdim
@zdim Yes. They are the same my ($IP) = $OCT . '\.' . $OCT . '\.' . $OCT . '\.' . $OCT :) — Mogens TrasherDK

zdim zdim · Accepted Answer · 2017-09-01T08:21:46

If that's the only problem try with the negative lookahead

my ($ip) = $line =~ /($IP)(?![.\d])/;

which works for the shown data.

The character class in the lookahead, [.\d], is needed because the last term in the $IP regex allows for a variable number of digits, via \d?. So with (?!\.) alone the engine can match one fewer digits than there are and then that remaining digit satisfies the non-. restriction.^†

Thus we need to disallow both the . and a digit following the pattern.

A complete program

use warnings;
use strict;

my $t = 'a 194.102.60.190.host.ifxnetworks.com [190.60.102.194] b';

my $n = qr/(?:25[012345]|2[0-4]\d|1?\d\d?)/;

my $IP = qr/$n\.$n\.$n\.$n/;

my @m = $t =~ /($IP)(?![.\d])/g;

print "@m\n";

prints 190.60.102.194

^† Consider the substring 90.host. The pattern /\d\d?(?!\.)/ for it works as follows.

The first \d matches 9. But the next one, \d?, is optional (non-greedy) and it does not match if the rest of the pattern can then match. Indeed, (?!\.) sees the following 0 to be not a . and so we match 9 and 0 satisfies (?!\.). The whole pattern (wrongly) matches

perl -wE'$_ = q(90.host); @m = /(\d)(\d?)(?!\.)(.)/; say for @m'

prints

9

0

The middle capture group caught nothing and the next character (.) is the 0.

Now consider the pattern /\d\d?(?![.\d])/ for the same substring. The (?![.\d]) requires that what follows is neither the . nor a digit. Thus the optional \d? is forced to match the next digit, 0. Since the next character then is a . the pattern fails.

With (?![.\d]) in the above one-liner test instead of (?!\.) nothing is printed, as the pattern doesn't match at all. (In some shells you may have to escape !, so (?\![.\d]), or use a script.)

The engine may well not go exactly as described, this is more of a loose description of its operation.

Perl Regex - extracting ipv4 from maillog

2 Answers