0
votes

Is it possible using awk or sed to get the line number of a line such that it is the first line matching a regex after another line matching another regex?

In other words:

  1. Find line l1 matching regex r1. l1 is the first line matching r1.
  2. Find line l2 below l1. l2 matches regex r2. l2 is the first line matching r2, ignoring lines l1 and above.

Clarification: By match I mean partial match, for most general solution. A partial match can of course be turned into a full-word match with \<...\> or a full-line match with ^...$.

Example input:

- - '787928'
  - stuff
- - '810790'
  - more stuff
- - '787927'
  - yet more stuff
- - '828055'
  - some more stuff
- - '828472'
  - some other stuff

If r1 is ^-.*787927.* and r2 is ^- I'd expect the output to be 7, i.e. the number of the line that says - - '828055'.

3
Sorry, this is not the way StackOverflow works. Questions of the form "I want to do X, please give me tips and/or sample code" are considered off-topic. Please visit the help center and read How to Ask, and especially read Why is β€œCan someone help me?” not an actual question? – kvantour
Don't use the word pattern as it's highly ambiguous. Instead use the word regexp or string, whichever it is you mean, and clarify if you want partial, full-word, or full-line matches or something else. Additionally - whatever it is you're trying to do post concise, testable sample input and expected output that full covers all your requirements. – Ed Morton

3 Answers

3
votes

Input example :

world
zekfzlefkzl
fezekzevnkzjnz
hello
zeniznejkglz
world
eznkflznfkel
hello
zenilzligeegz
world

Command :

pat1="hello"; pat2="world";
awk -v pat1=$pat1 -v pat2=$pat2 '$0 ~ pat1{pat1_match = 1}($0 ~ pat2)&&pat1_match{print NR; exit}' <input>

Output :

6
3
votes

For an input file that looks like this:

 1  pat2
 2  x
 3  pat1
 4  x
 5  pat2
 6  x
 7  pat1
 8  x
 9  pat2

you could use sed as follows:

$ sed -n '/pat1/,${/pat2/{=;q;};}' infile
5

which works like this:

sed -n '       # suppress output with -n
/pat1/,$ {     # for all lines from the first occurrence of "pat1" on...
    /pat2/ {   # if the line matches "pat2"
        =      # print line number
        q      # quit
    }
}' infile

The above fails if the first occurrence of pat1 is on the same line as pat2:

 1  pat2
 2  x
 3  pat1 pat2
 4  x
 5  pat2
 6  x
 7  pat1
 8  x
 9  pat2

would print 3. With GNU sed, we can use this instead:

$ sed -n '0,/pat1/!{/pat2/{=;q;};}' infile
5
sed -n '     # suppress output
0,/pat1/! {  # for all lines after the first occurrence of "pat1"
    /pat2/ { # if the line matches "pat2"
        =    # print line number
        q    # quit
    }
}' infile

The 0 address is a GNU extension; using 1 instead would break if pat1 was on the first line.

0
votes

This might work for you (GNU sed):

sed -n '/^-.*787927.*/{:a;n;/^-/!ba;=;q}' file

On encountering a line that begins -.*787927.*, start a loop that replaces the current line with the next, until a line begins - where on print the line number and quit.