0
votes

I am looking for an awk or sed or grep or any other option in bash to group lines into sets based on a pattern, then exclude sets from the pool of sets based on a blacklist of words.

E.g., see example below, I'd like to print all sets which don't have "hello" and "idle" in them. The blacklist might be expanded with more words than these two, in the future.

I tried using awk & grep but not able to come up with a good solution to accomplish that.

$ grep -v "hello" test.out | more
 row1  set 1
 row2
 --
 row1  set 2
 row2
 row3 is "fine"

Input file test.out

row1  set 1 
row2
row3 is "hello"
--
row1  set 2 
row2
row3 is "fine"
--
row1  set 3 
row2
row3  is "idle"
--
row1  set 4 
row2
row3
...
--
row1  set n
row2
row3

expected output :

row1  set 2 
row2
row3 is "fine"
--
row1  set 4 
row2
row3
...
--
row1  set n
row2
row3
2

2 Answers

3
votes

With gnu awk you can set Record Selector to --, then say that we do not need records with hello and idle

awk 'BEGIN{RS=ORS="--"};!(/hello/||/idle/)' file

 row1  set 2
 row2
 row3 is "fine"
 --
 row1  set 4
 row2
 row3
 ...
 --
 row1  set n
 row2
 row3
--

This !(/hello/||/idle/) could also be written like this !/hello/&&!/idle/ or as ED writes !/hello|idle/

An other separator could also be used like this:

awk 'BEGIN{RS=ORS="row1  set"};!/hello/&&!/idle/' file
0
votes

This might work for you (GNU sed):

sed -E ':a;N;$!{/^--$/M!ba};/hello|idle/d' file

Gather up lines until a line beginning -- is encountered, then if the collection contains either hello or idle delete them, printing everything else.

An alternative:

sed -nE 'h;:a;n;H;/^--$/!{$!ba};x;/hello|idle/!p' file