3
votes

I'm working on a distributed fail2ban like system in perl/mysql/iptables.

Extracting ipv4 addresses from /var/log/messages is working, but now I want to add /var/log/maillog to the soup.

I have a perl regex:[1]

/ (?:25[012345]|2[0-4]\d|1?\d\d?)\.
  (?:25[012345]|2[0-4]\d|1?\d\d?)\.
  (?:25[012345]|2[0-4]\d|1?\d\d?)\.
  (?:25[012345]|2[0-4]\d|1?\d\d?) /x

And a line from maillog:

v817YjcU016645: 194.102.60.190.host.ifxnetworks.com [190.60.102.194] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA

Here the regex matches both 194.102.60.190.host.ifxnetworks.com and [190.60.102.194]

In my code I have ($IP is the above regex):

if ($line =~ m/($IP)/)
{
    my ($ip) = $1;

Here the first matching ip-like string is found 194.102.60.190.host.ifxnetworks.com

So, how do I get the regex to ignore an ipv4 that ends in a .


[1] for readability Perl supports the /x option

2
Did you try the negative lookahead? like /...(?!\.)/zdim
I think dots after IP addresses are just a starting point to the problemWolf
@zdim Yes. Matches 194.102.60.190.host.ifxnetworks.comMogens TrasherDK
I split your regex over lines for readability. Please check. If it's OK ... are all four indeed the same?zdim
@zdim Yes. They are the same my ($IP) = $OCT . '\.' . $OCT . '\.' . $OCT . '\.' . $OCT :)Mogens TrasherDK

2 Answers

5
votes

If that's the only problem try with the negative lookahead

my ($ip) = $line =~ /($IP)(?![.\d])/;

which works for the shown data.

The character class in the lookahead, [.\d], is needed because the last term in the $IP regex allows for a variable number of digits, via \d?. So with (?!\.) alone the engine can match one fewer digits than there are and then that remaining digit satisfies the non-. restriction.

Thus we need to disallow both the . and a digit following the pattern.


A complete program

use warnings;
use strict;

my $t = 'a 194.102.60.190.host.ifxnetworks.com [190.60.102.194] b';

my $n = qr/(?:25[012345]|2[0-4]\d|1?\d\d?)/;

my $IP = qr/$n\.$n\.$n\.$n/;

my @m = $t =~ /($IP)(?![.\d])/g;

print "@m\n";

prints 190.60.102.194


Consider the substring 90.host. The pattern /\d\d?(?!\.)/ for it works as follows.

The first \d matches 9. But the next one, \d?, is optional (non-greedy) and it does not match if the rest of the pattern can then match. Indeed, (?!\.) sees the following 0 to be not a . and so we match 9 and 0 satisfies (?!\.). The whole pattern (wrongly) matches

perl -wE'$_ = q(90.host); @m = /(\d)(\d?)(?!\.)(.)/; say for @m'

prints

9

0

The middle capture group caught nothing and the next character (.) is the 0.

Now consider the pattern /\d\d?(?![.\d])/ for the same substring. The (?![.\d]) requires that what follows is neither the . nor a digit. Thus the optional \d? is forced to match the next digit, 0. Since the next character then is a . the pattern fails.

With (?![.\d]) in the above one-liner test instead of (?!\.) nothing is printed, as the pattern doesn't match at all. (In some shells you may have to escape !, so (?\![.\d]), or use a script.)

The engine may well not go exactly as described, this is more of a loose description of its operation.

0
votes

In general, regular expressions match on wanted patterns in existing character sequences, it's always a bit harder to not-match if something unwanted exists.

You can match IP addresses[1] that are followed by a non-dot ([^.]):

(?:\d{1,3}\.){3}\d{1,3}[^.]

and IP addresses at the end of the line ($):

(?:\d{1,3}\.){3}\d{1,3}$

you may combine the two patterns by alteration (|) in a not-capturing group ((?:...)):

(?:\d{1,3}\.){3}\d{1,3}(?:[^.]|$)

A similar problem may be that your next task could be to exclude IP addresses that have a dot before them, another problem is that it would also match 2.3.4.5 in 1.2.3.4.5, which leads back to my introductory statement...

I think that the IP addresses you try to match are best found with something that checks the surrounding characters as well. Be specific about this. For the development stage, try to check non-matching lines by matching them to "garbage patterns". In the case shown in the question (where spaces and brackets are acceptable surroundings), I'd suggest to use

(?:[ \[]|^)((?:\d{1,3}\.){3}\d{1,3})(?:[ \]]|$)

[1] I use a simplified regex here, that also matches 333.333.333.333 or 000.000.000.000, it can be of course be improved to restrict matches to valid IP addresses, but solutions for this are abundant.