I have a regex that is to only match alphanumeric characters,".", and "_" both before and after the @ sign. It is to match only the following TLDs:
com, org, edu, gov, uk, net, ca, de, jp, fr, au, us, ru, ch, it, nl, se, no, mil, biz, io, cc, co, info
For example, it should match sample22_test.tester.edu@auto.gmail.mil
and test@gmail.com
, but not anothertest.325-2352@yahoo.pys
(contains hyphen and non matching TLD) or tester1234@yahoo.neta
(.net is a matching TLD, but .neta is not)
I have the following regex:
my $email_regex = qr/[a-zA-Z0-9._]+\@[a-zA-Z0-9._]+\.(com|org|edu|gov|uk|net|ca|de|jp|fr|au|us|ru|ch|it|nl|se|no|mil|biz|io|cc|co|info)/;
This is matching correctly up to the appropriate TLD, but then if the TLD has any additional alphanumeric characters after it, it is still counting it as a match (which it shouldn't), it just doesn't display any alphanumeric characters after the TLD.
input:
sample@gmail.com example@autotest.comcast.net<sender: apache.apache_testapache@apache.edu >
whoisthis@questions.gov,find@find.co{}Failure@pastattempts.frz;
sample2@yahoo.com
sample5@test.biz : test
sample92.sdfj@gmail.com
sample22_242@tech.org
greenjeans_93_who.ask@tester.info
computergeek324@ask.nets
anothertest.tester.gov@gmail.ch
helloooooow232@aol.com<;Senderfailure>
finaltest23_3test@yahoo.its
output (I have inserted comments to indicate what matched correctly and what shouldn't have matched but did anyways):
sample@gmail.com #correct
example@autotest.comcast.net #correct
apache.apache_testapache@apache.edu #should not match
whoisthis@questions.gov #correct
find@find.co #correct
Failure@past.attempts.fr #should not match
sample2@yahoo.com #correct
sample5@test.biz #correct
sample92.sdfj@gmail.com #correct
sample22_242@tech.org #correct
greenjeans_93_who.ask@tester.info #correct
computergeek324@ask.net #should not match
anothertest.tester.gov@gmail.ch #correct
helloooooow232@aol.com #correct
finaltest23_3test@yahoo.it #should not match
EDIT: input file contains many other characters after the email such as < , > , :, ;, "
these are okay and can still be matched, just not included in the output as seen above.
...co|info
is the last thing in the string; the way it's now the regex matches the given pattern but it's OK if the string then have more after it. So you need to add the end-of-string anchor. So...co|info)$/
(or\Z
) - zdim= qr/^
. - Stefan Becker\b
, as @Nick says in their answer, except that you may have to allow<
instead. - zdimapache.apache_testapache@apache.edu
not match? It seems to fulfill your requirements? - Stefan Becker