I would like to be able to match a domain by following the below rules:
- The domain name should be a-z | A-Z | 0-9 and hyphen(-)
- The domain name should between 1 and 63 characters long
- Last Tld must be at least two characters, and a maximum of 6 characters
- The domain name should not start or end with hyphen (-) (e.g. -google.com or google-.com)
- The domain name can be a subdomain (e.g. mkyong.blogspot.com)
I already have the java flavored regex I just need this python flavored
^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$
I couldn't find any python regex for this matter as everyone expects the use of urlparse. I don't need to split a url by domain, port, tlds and so on, I only need to do a simple domain replace so regex should be the solution for me
What I have done:
expectedstring = re.sub(r"^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$" , "XXX" , string)
Example strings:
string = "This is why this domain example.com will never be the same after some years, it might just be example.co.uk but will never get to example.-com. Documents could be located in this specific location http://en.example.com/documents/print.doc as you probably already know."
expectedstring = "This is why this domain XXX will never be the same after some years, it might just be XXX but will never get to example.-com. Documents could be located in this specific location http://XXX/documents/print.doc as you probably already know."
List of valid domain names
- www.google.com
- google.com
- mkyong123.com
- mkyong-info.com
- sub.mkyong.com
- sub.mkyong-info.com
- mkyong.com.au
- g.co
- mkyong.t.t.co
List of invalid domain names, and why.
- mkyong.t.t.c - Tld must between 2 and 6 long
- mkyong,com - Comma is not allow
- mkyong - No Tld
- mkyong.123 , Tld not allow digit
- .com - Must start with [A-Za-z0-9]
- mkyong.com/users - No Tld
- mkyong.com - Cannot begin with a hyphen -
- mkyong-.com - Cannot end with a hyphen -
- sub.-mkyong.com - Cannot begin with a hyphen -
- sub.mkyong-.com - Cannot end with a hyphen -
string
? – tobias_kmkyong.t.t.t.co
– Quinn