I have a regex that is to only match alphanumeric characters,".", and "_" both before and after the @ sign. It is to match only the following TLDs:
com, org, edu, gov, uk, net, ca, de, jp, fr, au, us, ru, ch, it, nl, se, no, mil, biz, io, cc, co, info
For example, it should match [email protected]
and [email protected]
, but not [email protected]
(contains hyphen and non matching TLD) or [email protected]
(.net is a matching TLD, but .neta is not)
I have the following regex:
my $email_regex = qr/[a-zA-Z0-9._]+\@[a-zA-Z0-9._]+\.(com|org|edu|gov|uk|net|ca|de|jp|fr|au|us|ru|ch|it|nl|se|no|mil|biz|io|cc|co|info)/;
This is matching correctly up to the appropriate TLD, but then if the TLD has any additional alphanumeric characters after it, it is still counting it as a match (which it shouldn't), it just doesn't display any alphanumeric characters after the TLD.
input:
[email protected] [email protected]<sender: [email protected] >
[email protected],[email protected]{}[email protected];
[email protected]
[email protected] : test
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]<;Senderfailure>
[email protected]
output (I have inserted comments to indicate what matched correctly and what shouldn't have matched but did anyways):
[email protected] #correct
[email protected] #correct
[email protected] #should not match
[email protected] #correct
[email protected] #correct
[email protected] #should not match
[email protected] #correct
[email protected] #correct
[email protected] #correct
[email protected] #correct
[email protected] #correct
[email protected] #should not match
[email protected] #correct
[email protected] #correct
[email protected] #should not match
EDIT: input file contains many other characters after the email such as < , > , :, ;, "
these are okay and can still be matched, just not included in the output as seen above.
...co|info
is the last thing in the string; the way it's now the regex matches the given pattern but it's OK if the string then have more after it. So you need to add the end-of-string anchor. So...co|info)$/
(or\Z
) – zdim= qr/^
. – Stefan Becker\b
, as @Nick says in their answer, except that you may have to allow<
instead. – zdim[email protected]
not match? It seems to fulfill your requirements? – Stefan Becker