Validate a domain with or without sub-domain using regular expression - PCRE - PHP

Question

I am trying to validate a email address using a simplest form of regular expression - not - RFC‑822–compliant regex

and also need to capture username - sub-domain (if any) - domain and - TLD suffix i.e. (com, net ....) For this I've come up with following regex:

/^([a-z0-9_\-\.]{6,})+@((?:[a-z0-9\.])*)([a-z0-9_\-]+)[\.]([a-z0-9]{2,})$/i

and for example the emails are:

username@domain.com
username@us.domain.com
username@au.domain.com
username@us.au.domain.com

and the regex should validate them all and capture all the groups.

So, I was wondering if the regex is correct or is there anything else I need to consider too?

Ahh, it's because I needed the username to be at least 6 characters. Sorry I forgot to include that within my question — bn00d

zx81 zx81 · Accepted Answer · 2014-04-22T06:19:58

n00p, I see that you had not yet found an expression to do exactly what you wanted, and that you said "may be someone will come up with better solution and post it here".

So here is a regex that does what you wanted. I have modified your own expression the least amount possible, assuming that you knew what you wanted.

To make it easy to read, the expression is in free-spacing mode. You use it like any other regex.

$regex = "~(?ix) # case-insensitive, free-spacing
^                # assert head of string
([a-z0-9_-]{6,24})    # capture username to Group 1
(?<=[0-9a-z])     # assert that the previous character was a digit or letter
@                 # literal
(                 # start group 2: whole domain
(?:[a-z0-9-]+\.)* # optional subdomain: don't capture
(                 #start group 3: domain
[a-z0-9_-]+       # the last word
\.                # the dot
([a-z]{2,})       # capture TLD to group 4
)                 # end group 3: domain
)                 # end group 2: whole domain
$                 # assert end of string
~";

This will capture username to Group 1, the whole domain to Group 2, domain to Group 3, and the TLD to Group 4.

One small change you will see is that I have unescaped the - and . in the character classes because there is no need to do so. I did not replace the [a-z0-9_] expressions with \w because if you ever switch to unicode or a different locale we might have surprising results.

Here is the whole thing in use:

<?php
$emails = array("username@domain.com",
           "username@us.domain.com",
           "username@au.domain.com",
           "username@us.au.domain.com");

$regex = "~(?ix) # case-insensitive, free-spacing
^                # assert head of string
([a-z0-9_-]{6,24})    # capture username to Group 1
(?<=[0-9a-z])     # assert that the previous character was a digit or letter
@                 # literal
(                 # start group 2: whole domain
(?:[a-z0-9-]+\.)* # optional subdomain: don't capture
(                 #start group 3: domain
[a-z0-9_-]+       # the last word
\.                # the dot
([a-z]{2,})       # capture TLD to group 4
)                 # end group 3: domain
)                 # end group 2: whole domain
$                 # assert end of string
~";

echo "<pre>";
foreach($emails as $email) {
    if(preg_match($regex,$email,$match)) print_r($match);
}
echo "</pre>";
?>

And here is the output:

Array
(
    [0] => username@domain.com
    [1] => username
    [2] => domain.com
    [3] => domain.com
    [4] => com
)
Array
(
    [0] => username@us.domain.com
    [1] => username
    [2] => us.domain.com
    [3] => domain.com
    [4] => com
)
Array
(
    [0] => username@au.domain.com
    [1] => username
    [2] => au.domain.com
    [3] => domain.com
    [4] => com
)
Array
(
    [0] => username@us.au.domain.com
    [1] => username
    [2] => us.au.domain.com
    [3] => domain.com
    [4] => com
)

Validate a domain with or without sub-domain using regular expression - PCRE - PHP

3 Answers