Regex: match pattern as long as it's not in the beginning

votes

Assume the following strings:

aaa bbb ccc
bbb aaa ccc

I want to match aaa as long as it is not at the start of the string. I'm trying to negate it by doing something like this:

[^^]aaa

But I don't think this is right. Using preg_replace.

regexregex-negation

Are you only matching aaa? Replacing it with what? – Explosion Pills

6 Answers

votes

You can use a look behind to make sure it is not at the beginning. (?<!^)aaa

votes

Since I came here via Google search, and was interested in a solution that is not using a lookbehind, here are my 2 cents.

The [^^]aaa pattern matches a character other than ^ and then 3 as anywhere inside a string. The [^...] is a negated character class where ^ is not considered a special character. Note the first ^ that is right after [ is special as it denotes a negation, and the second one is just a literal caret symbol.

Thus, a ^ cannot be inside [...] to denote the start of string.

A solution is to use any negative lookaround, these two will work equally well:

(?<!^)aaa

and a lookahead:

(?!^)aaa

Why lookahead works, too? Lookarounds are zero-width assertions, and anchors are zero-width, too - they consume no text. Literally speaking, (?<!^) checks if there is no start of string position immediately to the left of the current location, and (?!^) checks if there is no start of string position immediately to the right of the current location. The same locations are being checked, that is why both work well.

votes

If you don't want to use lookbehind then use this regex:

/.(aaa)/

And use matched group # 1.

votes

This situation is the first time that I've seen lookarounds outperform \K. Interesting.

Typically capture groups and lookarounds cost additional steps. But due to the nature of this task, the regex engine can navigate the string faster in search of the aaa then look back for a start of the string anchor.

I'll add a couple of \K patterns for comparison.

I am using the s pattern modifier in case the leading character might be a newline character (which . would not normally match). I just thought I would add this consideration to preemptively address a fringe case that I may be posed.

Again, this is an enlightening scenario because in all other regex cases that I've dealt with \K beats out the other techniques.

Step Count Comparison Matrix:

              | `~.\Kaaa~s` | `~.+?\Kaaa~s` | `(?<!^)aaa` | `(?!^)aaa` | `.(aaa)` |
--------------|-------------|---------------|-------------|------------|----------|
`aaa bbb ccc` |   12 steps  |    67 steps   |   8 steps   |  8 steps   | 16 steps |
--------------|-------------|---------------|-------------|------------|----------|
`bbb aaa ccc` |   15 steps  |    12 steps   |   6 steps   |  6 steps   | 12 steps |

The take away is: To learn about the efficiency of your patterns, spit them into regex101.com and compare the step counts.

Also, if you know exactly what substring you are looking for and you don't need a regex pattern, then you should be using strpos() as a matter of best practice (and just check that the returned value is > 0)

...in other words:

if (strpos($haystack, 'aaa')) {
    // 'aaa' is "truthy"
    // 'aaa' is found and not positioned at offset zero
}

votes

This will work to find what you are looking for:

(?<!^)aaa

Example in use: http://regexr.com?34ab2

votes

I came here looking at a solution for the re2 engine, used by google spreadsheets, which doesn't support lookarounds. But the answers here gave me the idea of using the following. I don't understand why i have to replace by the captured group but anyhow, it works.

aaa bbb ccc
bbb aaa ccc

([^^])aaa

replace by:

$1zzz

reuslts in:

aaa bbb ccc
bbb zzz ccc