14
votes

Assume the following word sequence

BLA text text text  text text text BLA text text text text LOOK text text text BLA text text BLA

What I would like to do is to extract the text from BLA to LOOK, but the BLA which is the closest to look. I.e. I would like to get

BLA text text text text LOOK 

How should I do that using regular expressions? I got one solution which works, but which is exteremely inefficient.

BLA(?!.*?BLA.*?LOOK).*?LOOK

Is there a better and more performant way to achieve matching this pattern?

What I would like to do is: I would like to match BLA, then forward lookahead until either positive fordward lookahead with LOOK or negative lookahead with BLA. But I don't know a way to put this into a regular expression.

As a engine I use re in python.

3

3 Answers

18
votes
(?s)BLA(?:(?!BLA).)*?LOOK

Try this. See demo.

Alternatively, use

BLA(?:(?!BLA|LOOK)[\s\S])*LOOK

To be safer.

0
votes

Another way to extract the desired text is to use the tempered greedy token technique, which matches a series of individual characters that do not begin an unwanted string.

r'\bBLA\b(?:(?!\bBLA\b).)*\bLOOK\b'

Start your engine! | Python code

\bBLA\b        : match 'BLA' with word boundaries
(?:            : begin non-capture group
  (?!\bBLA\b)  : negative lookahead asserts following characters are not
                 'BLA' with word boundaries
  .            : match any character
)              : end non-capture group
*              : execute non-capture group 0+ times
\bLOOK\b       : match 'LOOK' with word boundaries

Word boundaries are included to avoid matching words such as BLACK and TRAILBLAZER.

-1
votes

simply find text between LOOK and BLA without BLA

In : re.search(r'BLA [^(BLA)]+ LOOK', 'BLA text text text  text text text BLA text text text text LOOK text text text BLA text text BLA').group()
Out: 'BLA text text text text LOOK'

:-)