4
votes

I need help with a RegEx problem:

I want to find occurences of two known words ("foo" and "bar" for example), that have any white space other than EXACTLY ONE SPACE CHARACTER between them.

In the text that I have to grep, there may be spaces, tabs, CRs, LFs or any combination of them between the two words.

In RegEx words: I need one regular expression that matches "foo[ \t\n\r]+bar" but does NOT match "foo bar".

Everything I've tried so far either missed some combinations or also matched the single-space-case which is the only one that should NOT match.

Thanks in advance for any solutions.

EDIT: To clarify, I'm using Perl compatible RegEx here.

3

3 Answers

4
votes

You could also use a negative lookahead:

foo(?! \b)\s+bar

If lookaheads are not supported you can write it explicitly:

foo(?:[^\S ]| \s)\s*bar

The expression [^\S ] includes a double negative and it might not be immediately obvious how this works. If you work it out the logic it means any whitespace apart from a space.

1
votes

You could use (assuming ERE, i.e. grep -E)

foo[:space:]{2,}bar

The syntax x{min,} means the pattern x must appear at least min times.


If by "other than EXACTLY ONE SPACE CHARACTER" you mean except the 0x20 space character, you need an alternation:

foo([\t\n\r]|[ \t\n\r]{2,})bar
0
votes

use [:space:]{2,}

{2,} means 2 or more