2
votes

been searching here and got close but seems like still not quite what i'm trying to do. eg. please consider following sample test input, the objective is to find matches that span multiple lines that start with line that contains "abc" (print this line), and ends with line that contains "efg" (also print this line), and also print the lines in between.

yyabc}
000
iiabc<
    {efg+1}
111
yyabc}
222
 p  {efg+13}
zzz
   z   {efg+243} {}
iii
oooabc>
ooo

The closest that came to meeting what i'm looking for is, with zzz as the test input file with above lines,

sed -e '/abc/,/efg/!d' zzz

, but includes extra lines, that wouldn't mind not being there,

yyabc}   <<***** extra
000      <<***** extra
iiabc<
    {efg+1}
yyabc}
222
 p  {efg+13}
oooabc>  <<***** extra
ooo      <<***** extra

, thus expected output is,

iiabc<
    {efg+1}
yyabc}
222
 p  {efg+13}

Besides relying on pcregrep (i have everything else in the linux box), is there a solution that can produce such multiple lines matching?

Thanks much.

6

6 Answers

1
votes

awk is well suited to this task. If you test input file is called zzz, then run:

$ awk '/abc/{a=""} /abc/,/efg/{a=a"\n"$0} /efg/{print substr(a,2);a=""}' zzz
iiabc<
    {efg+1}
yyabc}
222
 p  {efg+13}

Explanation:

  • /abc/{a=""}

    Every time that a line containing "abc" is reached, set the variable a to an empty string. (The lines that we want to print will be added to this variable in the next step.)

  • /abc/,/efg/{a=a"\n"$0}

    Over every range of lines that starts with a line containing abc and ends with a line containing efg, each line is appended to the variable a.

  • /efg/{print substr(a,2);a=""}

    When the last line in the range is reached, print out a. Because a begins with an extra newline character, we use substr to remove it.

Without the first step above, the program runs fine but the "extra" lines would be printed. With the first step included, they are eliminated.

1
votes
sed -n '/abc/,/efg/ {
   H
   /efg/ {
      g
:a
      s/^.*\n\(.*abc\)/\1/
      ta
      p
      }
   }' zzz

Use of the buffer to catch the part between abc and first efg, than remove any line before the last abc line, finally print the result and continue to rest of text.

Does not work if abc is on the same line as efg with no previous abc from "same" part of text because sed //,// work from patterne on one line until pattern on ANOTHER line

1
votes

Using a perl one-liner that slurps the entire file:

perl -0777 -ne 'print /.*abc.*\n(?:(?!.*(?:abc|efg)).*\n)*.*efg.*\n/g' file.txt

Or a line by line buffered solution:

perl -ne '
    $b = /abc/ ? $_ : "$b$_";
    print $b if (/abc/ .. /efg/) =~ /E/
  ' file.txt

Switches:

  • -0777: Slurp the entire file.
  • -n: Creates a while(<>){...} loop for each “line” in your input file.
  • -e: Tells perl to execute the code on command line.
1
votes

This might work for you (GNU sed):

sed -n '/abc/,/efg/{/abc/{h;d};H;/efg/{g;p}}' file

Used sed in "grep" mode by invoking the -n switch. Filter the lines of interest between abc and efg`. Use the hold space (HS) to store inclusive lines and then print them out.

Alternative:

sed -n '/abc/,/efg/{/abc/h;//!H;/efg/{x;p}}' file
0
votes
(.*?abc(?:(?:(?!efg|abc).)|\n)*efg.*$)

Try this through perl.

See demo.

http://regex101.com/r/bA0jG5/11

0
votes

A straightforward array based awk solution:

awk '/abc/ {delete a;j=0;flag=1}
     flag    {a[++j]=$0}
     /efg/ && flag {for (i=1;i<=j;i++){print a[i]};flag=0}' inputfile

/abc/ {delete a;j=0;flag=1} : When find initial pattern ,delete the array , set counter to zero and turn on the "find" flag.

flag {a[++j]=$0} : Store line content when flag is on.

/efg/ && flag {for (i=1;i<=j;i++){print a[i]};flag=0}: when end pattern is found and flag on , show the array and turn off flag