How to replace pattern of repeating characters/words only at the beginning of the string?

Question

Note that this question is in the context of Julia, and therefore (to my knowledge) PCRE.

Suppose that you had a string like this:

"sssppaaasspaapppssss"

and you wanted to match, individually, the repeating characters at the end of the string (in the case of our string, the four "s" characters - that is, so that matchall gives ["s","s","s","s"], not ["ssss"]). This is easy:

r"(.)(?=\1*$)"

It's practically trivial (and easily used - replace(r"(.)(?=\1*$)","hell","k") will give "hekk" while replace(r"(.)(?=\1*$)","hello","k") will give "hellk"). And it can be generalised for repeating patterns by switching out the dot for something more complex:

r"(\S+)(?=( \1)*$)"

which will, for instance, independently match the last three instances of "abc" in "abc abc defg abc h abc abc abc".

Which then leads to the question... how would you match the repeating character or pattern at the start of the string, instead? Specifically, using regex in the way it's used above.

The obvious approach would be to reverse the direction of the above regex as r"(?<=^\1*)(.)" - but PCRE/Julia doesn't allow lookbehinds to have variable length (except where it's fixed-variable, like (?<=ab|cde)), and thus throws an error. The next thought is to use "\K" as something along the lines of r"^\1*\K(.)", but this only manages to match the first character (presumably because it "advances" after matching it, and no longer matches the caret).

For clarity: I'm seeking a regex that will, for instance, result in

replace("abc abc defg abc h abc abc abc",<regex here>,"hello")

producing

"hello hello defg abc h abc abc abc"

As you can see, it's replacing each "abc" from the start with "hello", but only until the first non-match. The reverse one I provide above does this at the other end of the string:

replace("abc abc defg abc h abc abc abc",r"(\S+)(?=( \1)*$)","hello")

produces

"abc abc defg abc h hello hello hello"

use this ^(\S+)(?:\s+\1)* and then do splitting on space character, — Avinash Raj
@AvinashRaj - if you read through the full question, you'll see that I'm wanting to know if it can be done without multiple steps - that is, just with a regex. I can do it with the end-of-string equivalent, with r"(.)(?=\1*$)" (or more generally r"(\S+)(?=( \1)*$)"). — Glen O
@KingMob - you can do that, if you only want to match in one direction. But it's not a regex solution, and it doesn't help if you want to match in both directions (for instance, to capture the first sho in each part of the string "shorts shoes shop shovel shortstuff shoplifter shopshop", and not the second sho in shopshop, you need to look in both directions - you're basically figuring out which part isn't the delimiter, in a sense) — Glen O
Well, you could replace (\S+)((?= \1)|.*) with "hello$2" (JavaScript for example), but unfortunately you need to be able to reference the submatches in the replacement string which AFAIK you can't do in Julia :-( — 1983

Casimir et Hippolyte Casimir et Hippolyte · Accepted Answer · 2015-07-19T17:32:47

You can use the \G anchor that matches the position after the previous match or at the start of the string. In this way you ensure the contiguity of results from the start of the string to the last occurrence:

\G(\S+)( (?=\1 ))?

demo

or to be able to match until the end of the string:

\G(\S+)( (?=\1(?: |\z)))?

How to replace pattern of repeating characters/words only at the beginning of the string?

2 Answers