2
votes

Ok, so I have a phrase "foo bar" and I want to find everything BUT "foo bar".
Here's my text.

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar

There's a way to do this just within regex right? I don't have to go and use strings etc. do I?

RESULT:

NOTE I can't do a nice highlighting but the bold gives you an idea (although the spaces that are before and after would also be selected but it breaks the bolding).

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar

Assume PCRE nomenclature.


UPDATE 7/29/2013: it may be better to use a search and replace function in your language of choice to just 'remove' the phrases that you don't want so that you are then left with the info you do want.

6
Tell us what operating system. Tell us what programming language. Tell us the EXACT STRING you are starting with. Tell us the EXACT STRING you expect as an answer. Tell us what you are not allowed to use AND WHY.tchrist
See question for 'exact string'; see question for expected result. I'm not allowed to use walruses, the colour blue, or cheeky phrases concerning Vogon poetry. I am allowed to use regex (with PCRE rules)....Why....who knows...but my guess has something to do with the walrus...or possible his hat.Keng

6 Answers

9
votes

In general, if foobar matches itself, then (?s:(?!foobar).)* matches anything that is not foobar, including nothing at all.

You could use that to find lines that don’t have foobar in them, for example, using

^(?:(?!foobar).)*$

You could also use your language’s split() function to split on foobar, which will give you all the pieces that do not include the split pattern.

Regarding the nasty little-known backtracking control verbs like (*FAIL) and (*COMMIT), I haven’t yet had much occasion to use them in ‘non-toy’ programs. I find that independent subexpressions via (?>...) and the possessive quantifiers *+, ++, ?+ etc. give me more than enough rope, so to speak.

That said, I do have one toy example of using (*FAIL) in this answer; it’s the very first regex solution. The reason for its being there was I wanted to force the regex engine to backtrack through all possible permutations; the real goal was merely to count how many ways it tried things.

Please understand that my two regexes there, along with the many, many incredibly creative answers from others, are all meant to be fun, tongue-in-cheek things. Still, one can learn a lot from them — once one recovers from shock. ☺

4
votes

try

^(?!.*foo bar).*$

this should select every line that does not contain "foo bar". (?! = negative lookahead)

2
votes

"remove everything except foo bar" is equivalent to "find only foo bar", which PCRE allows quite easily. Conversely, "find everything except foo bar" is equivalent to "find and remove only foo bar". So, complementation is easily done from your tools.

Aside from that, PCRE has a nasty little feature known as *FAIL which immediately causes a backtrack when it's encountered. So, I suppose inserting something like (*COMMIT)foo bar(*FAIL) into your regular expression could help. It's neither friendly nor very safe, though.

1
votes

Okay, so you want to remove everything except foo bar using UltraEdit's "Advanced" (Perl-regex style) search feature. The easiest way to do that is to match everything, but only capture foo bar, like this:

(?:(?!foo bar).)+(foo bar|$)

...and replace it with $1 or \1 (whichever style UltraEdit accepts).

I don't use UltraEdit, but in EditPadPro it converts this:

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar 

...to this:

foo bar

foo bar
foo bar

...which is the result you showed in your original message.

1
votes

Here: perl -pe 's{.*?(foo bar)?}{$1}g' <text

I want to find everything BUT "foo bar"

A match-only pattern without using substitution by $1 (that is usable with the empty replacement as in s{pattern}{})... not sure that is possible. You would have to gobble up chars up until foo bar, e.g. with .*?(?=foo bar). But then the matching algorithm continues on and sees "oo bar", and would match again as there is no f.

Continuing the quest, here is a piece of perl code that gobbles up the requested parts, only with the drawback that empty captures may be returned if foo bar happens to be at the start of the line:

foreach (<>) {
        chomp;
        @_ = m{(.*?)(?:foo bar|$)}gs;
        print "[[ $_ ]]\n" for @_;
}

There is no substituion involved and running this on the Lorem ipsum test file will show everything but the foo bar parts. It is PCRE compatible, but there is no guarantees that $EDITOR will does what you envision.

1
votes

to show everything except "foo bar" and "fad bad" this worked for me:

^(?!.*foo bar)(?!.*fad bad).*$