0
votes

I am trying to find email addresses from the HTML file, I need email addresses with top-level domain(tld) to level 1 only, for example from the email addresses given below, bold addresses are invalid in this case

I am using the following regex it works fine if there are only email addresses, but if I add any text after the email addresses it doesn't match the criteria.

(?=<\s|^)\b[a-zA-Z0-9.-]+@[a-zA-Z0-9-]+.[a-zA-Z]{2,6}$(?=\s|$|.+)

success case:

Failure case:

Any help in this scenario will be really appreciated.

3
Remove dollar sign from the end od regex. And dont forget, that tld can be more than 6 chars length.pavel
This solution consider tlds up to two level, where I need tld to one level onlySaqib Shafique

3 Answers

0
votes

I made this regex:

(?<=\s|^)([a-z0-9-.])+@+([a-z0-9-]*)\.([a-z]*)\s

It extracts email from string with one level tld. You can tokenize the text on spaces/line breaks and iteratively match with regex.

follow this link

0
votes

I've made the regex - my custom validator to extract the email addresses like that.

Try this:

^(?<check_Duplicate_Special_Symbol>(?![\w-.]*[\.@][\.@][\w-]*))(?<user>(?!\.)[\w.-]+)(?<domain>@(?:[A-Za-z][\w-.]*))(?<subDomain>(?:\.[A-Za-z][\w-.]*)+)$

For more info see regex-demo

But, it is not a good choice. You could consider how-to-validate-an-email-address-using-a-regular-expression to get a correctly validator.

0
votes

Try This for a single match:

(?:\s)(.[^@]*@[^.]*\.[^.0-9A-Z]*)(?:\s)

or This for a top level match and a match per-section:

(?:\s)((.[^@])(?:*@)([^.]*)(?:\.)([^.0-9A-Z]*))(?:\s)