Regex greedy parse direction

Question

I found there're two different opinions about how greedy regex is executed:

one is, read all the input string and match the pattern from the back, first match entire input,the first attempt is entire string. Some articles support this opinion are Oracle offical Java tutorial:

Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from.

also see this article: Performance of Greedy vs. Lazy Regex Quantifiers

the other is matching from the front, the first match attempt is from the 0 index in the left. when a match is found, the engine doesn't stop, keep matching the rest until it fails then it'll backtrack. Articles supports this opinion I found is:

Repetition with Star and Plus the Looking Inside The Regex Engine section talk about <.+>:

The first token in the regex is <. This is a literal. As we already know, the first place where it will match is the first < in the string.

I want to know which one is correct? This matters because it will affect the efficiency of regex. I added various language tags, because I want to know if it's implemented differently in each language.

Both articles tell the same thing. i mean, your first bullet point. Read Repetition ... link again. — Prince John Wesley
do you have a regexp you'd like us to examine? would you like to profile various regexps? — shinronin
@Prince John Wesley yes, both articles in the first bullet point, have the same opinion, the question is someone have different opinions like in the second bullet point and I don't know which one should follow. — Sawyer
Which link says the second bullet point, the last link Repetition with Star and Plus says the first bullet point. — Prince John Wesley
keep matching the rest until it fails then it'll backtrack - how will the parser know the match failure until it eat everything? — Prince John Wesley

David B David B · Accepted Answer · 2012-06-14T15:18:31

Assuming they're functionally equivalent (and based on my Java regex use, they are), it's just a difference in engine implementation. Regular Expressions are not implemented exactly the same way in all languages, and can be more or less powerful based on which language you use.

The second link describes Perl, so I'd trust Oracle on the Java side of things.

Both attempt to get the largest match possible.

Making a quantifier lazy by adding the ? key will attempt the smallest match possible.

Regex greedy parse direction

4 Answers