2
votes

I've tried searching extensively for this, and there are similar problems but yet I haven't been able to figure this out.

My problem is that I have, among others, strings on this form:

%Aliquam hendrerit mollis pretium! Praesent id%
%molestie \*libero vel\%\% pulvinar? Sed%
\%% urna. \% Fusce% in *sapien %mau\*ris.%

I want to select everything between two %s, ignoring cases where characters are preceeded by a \. The first one is trivial, and I have somehow been able to do the second one. The third one however I just can't figure out. To clarify, from the text above I want to select the following:

"%Aliquam hendrerit mollis pretium! Praesent id%"

"%molestie *libero vel\%\% pulvinar? Sed%"

"% urna. \% Fusce%"

"%mau*ris.%"

Want to point out that the original text can be a part of one long string without a newline, i.e. each line does not necessarily appear on new lines.

This far I have written the following regular expression that seems to be able to match everything except the last one:

(?<!\\)%([^%]*)(?!%\\)(?:%|(.*)%)(?<!\\%)

For the last one it selects:

"% urna. \% Fusce% in *sapien %mau*ris.%"

Which is too much. I don't really understand why it does it, maybe it is because of the or-condition in my regex? Any help is much appreciated!

1
Can't you strip out the escaped characters in a second step? s/\\.//g. And I'm a little confused, do you want all escaped charcters to be ignored, or only ignored percent signs? Your question states a different expected outcome than your example provides.knittl
Problem is that escaped characters are allowed, so I can't just strip them out. And I'll edit main post to hopefully clear up the confusion, sorry about that! EDIT: I have edited main post with correct examples of what I want to be able to select.Scheme

1 Answers

2
votes

This regex will give you the expected result :

/(?<!\\)(%.*?(?<!\\)%)/

See this Regex101.com

Explanation

1 - (?<!\\)% will match any % character not preceded by a backslash.

2 - .*? will match any character in a lazy way

3 - Surrounding (2) with (1) will match any character surrounded by a % not preceded by a backslash.