1
votes

I need a Regex which matches the closed [A-Za-z_0-9]*\.xml before ERROR.

The following input should match match_me.xml

  1. ERROR should be determined.
  2. The first [A-Za-z_0-9]*\.xml before ERROR should be matched. -> match_me.xml

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo

match_not_me.xml

dolores et ea rebum.

match_me.xml

Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna

ERROR

aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

2

2 Answers

1
votes

You can use awk for this:

awk 'p && /ERROR/{print p; p==""} /^[A-Za-z_0-9]*\.xml$/{p=$0}' file

match_me.xml
  1. When a line matches our pattern we store that line in variable p
  2. When p is set and we encounter ERROR we print p and reset it to blank.
1
votes

grep (PCRE) solution:

grep -Poz 'ERROR[\s\S]+?\s\K[A-Za-z_0-9]+\.xml' <(tac file) && echo
  • tac file - concatenate lines of the file in reverse order

  • [\s\S]+? - matches any character in "non-greedy" manner

  • \K - ignoring previous match

The output:

match_me.xml