Autohotkey RegExReplace Skip unmatched pattern

Question

How to skip an unmatched line in input on replacing by regex?

For Ex. Below is the contents of my test.txt

[email protected]
[email protected]
elke engineering ltd.,@yahoo.com
[email protected]
[email protected]

Below is my Autohotkey script with regex code

ReplaceEmailsRegEx := "i)([a-z0-9]+(\.*|\_*|\-*))+@([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}"
RemoveDuplicateCharactersRegEx := "s)(.)(?=.*\1)"

Try{
FileRead, EmailFromTxtFile, test.txt
OtherThanEmails :=RegExReplace(EmailFromTxtFile,ReplaceEmailsRegEx)
Chars :=RegExReplace(OtherThanEmails,RemoveDuplicateCharactersRegEx)
Loop{
StringReplace, OtherThanEmails, OtherThanEmails, `r`n`r`n,`r`n, UseErrorLevel
If ErrorLevel = 0
Break
}
If (StrLen(OtherThanEmails)){
Msgbox The Characters found other than email:`n%OtherThanEmails%
}
}
catch e {
ErrorString:="what: " . e.what . "file: " . e.file . " line: " . e.line . " msg: " . e.message . " extra: " . e.extra
Msgbox An Exception was thrown`n%ErrorString%
}
Return

When it replace on test.txt it throws error:

e.what contains 'RegExReplace', e.line is 10

It executes without error when I remove 3rd email in test.txt. So how to change my regex to skip the problematic string?

It exits from the execution of the whole file on error. So it skips remaining valid email matches — Dhay
For the person who downvoted: May I know the reason so that I can improve my next posts to be useful. — Dhay
You got a classical catastrophic backtracking with your regex. Where did you get this pattern from? Please try i)[a-z0-9]+(?:(?:\.+|_+|-+)[a-z0-9]+)*@([a-z][-a-z0-9]+\.)+[a-z]{2,6}. Or i)[a-z0-9]+(?:([._-])\1*[a-z0-9]+)*@([a-z][-a-z0-9]+\.)+[a-z]{2,6} — Wiktor Stribiżew
@WiktorStribiżew it worked. Your's is the answer. The catastrophic backtracked regex was created by me. That's why it worked like a charm. lol — Dhay
I think the downvote is due to the question itself - matching emails is so common a task that you can easily find a better regex for this by just searching SO via Google (I find Google search better than SO built-in one). — Wiktor Stribiżew

Wiktor Stribiżew Wiktor Stribiżew · Accepted Answer · 2016-03-29T09:46:14

The problem you have is catastrophic backtracking due to the nested quantifier in the beginning: ([a-z0-9]+(\.*|\_*|\-*))+. Here, the ., _ and - are all optional due to the * quantifier and thus your pattern gets reduced to ([a-z0-9]+)+.

I suggest "unrolling" the first subpattern to make it linear:

i)[a-z0-9]+(?:(?:\.+|_+|-+)[a-z0-9]+)*@([a-z][-a-z0-9]+\.)+[a-z]{2,6}

Or

i)[a-z0-9]+(?:([._-])\1*[a-z0-9]+)*@(?:[a-z][-a-z0-9]+\.)+[a-z]{2,6}

You may even remove \1* if you do not allow more than 1 . or _ or - in between "words".

Also, there is no need in using \-* with alternation in (\.|\-*\.), as the hyphen is matched with the previous character class, thus, this subpattern can be reduced to \..

See the regex demo

Autohotkey RegExReplace Skip unmatched pattern

1 Answers