I have a problem where I need to identify each occurrence of a problem within many files. The occurrence is determined based on a pattern across multiple lines.
In my case I'm trying to identify literals that have a leading space, contains more than a single consecutive space character or is preceded by a set of known small words (e.g. 'or', 'and', etc). Literals are determined by the single quote. However I'm only interested in literals where four lines before hand the line contains the word "LITERAL".
Here are some examples for the contents of a file:
EXEC LITERAL
LEVEL
NAME
LENGTH
VALUE (' Foo')
END EXEC
EXEC LITERAL
LEVEL
NAME
VALUE ('Foo Bar')
END EXEC
EXEC LITERAL
LEVEL
NAME
VALUE ('Bar Foo')
END EXEC
EXEC LITERAL
LEVEL
NAME
VALUE ('Foo')
END EXEC
EXEC LITERAL
LEVEL
NAME
LENGTH
VALUE ('or Bar')
END EXEC
EXEC DEFINITION
LEVEL
NAME
LENGTH
VALUE ('Bar')
END EXEC
In the above example I would want the output to identify the file and list occurrences of 'Foo', 'Foo Bar' and 'or Bar'. Note that 'Bar Foo' is not included as any spaces used to separate words within the quotes are acceptable if it's a single space.
I've been able to construct grep statements that allow me to identify instances of multiple spaces, leading spaces and containing a small word (via multiple pipes), however I cannot seem to use grep for regex. I saw mentioned in another article about using pcregrep to support regex in grep. I'm happy to do that, but I'm a tad lost with the regex expression to use.
So far I've got to the following command:
pcregrep -M 'LITERAL.*\n.*\n.*\n.*\n.*VALUE.* ' test.txt
Unfortunately it doesn't pick up the 'Foo Bar' example (because of the 4 x \n I presume). The next one picks up 'Foo Bar' but doesn't pick up 'or Bar':
pcregrep -M 'LITERAL.*\n.*\n.*\n.*\n.*VALUE.* ' test.txt
Also when I was testing with larger data sets it would pick up LITERAL when it doesn't meet the above patterns (e.g. it's part of another word unrelated to the above). I really need the expression to restrict matches to the given patterns, ignoring instances of VALUE or LITERAL that do not form the above example patterns.
Any help in how to resolve this would be most welcome.