In my grammar I have something like this:
line : startWord (matchPhrase|
anyWord matchPhrase|
anyWord anyWord matchPhrase|
anyWord anyWord anyWord matchPhrase|
anyWord anyWord anyWord anyWord matchPhrase)
-> ^(TreeParent startWord anyWord* matchPhrase);
So I want to match the first occurrence of matchPhrase, but I will allow up to a certain number of anyWord before it. The tokens that make up matchPhrase are also matched by anyWord.
Is there a better way of doing this?
I think it might be possible by combining the semantic predicate in this answer with the non-greedy option:
(options {greedy=false;} : anyWord)*
but I can't figure out exactly how to do this.
Edit: Here's an example. I want to extract information from the following sentences:
Picture of a red flower.
Picture of the following: A red flower.
My input is actually tagged English sentences, and the Lexer rules match the tags rather than the words. So the input to ANTLR is:
NN-PICTURE Picture IN-OF of DT a JJ-COLOR red NN-FLOWER flower
NN-PICTURE Picture IN-OF of DT the VBG following COLON : DT a JJ-COLOR red NN-FLOWER flower
I have lexer rules for each tag like this:
WS : (' ')+ {skip();};
TOKEN : (~' ')+;
nnpicture:'NN-PICTURE' TOKEN -> ^('NN-PICTURE' TOKEN);
vbg:'VBG' TOKEN -> ^('VBG' TOKEN);
And my parser rules are something like this:
sentence : nnpicture inof matchFlower;
matchFlower : (dtTHE|dt)? jjcolor? nnflower;
But of course this will fail on the second sentence. So I want to allow a bit of flexibility by allowing up to N tokens before the flower match. I have an anyWord token that matches anything, and the following works:
sentence : nnpicture inof ( matchFlower |
anyWord matchFlower |
anyWord anyWord matchFlower | etc.
but it isn't very elegant, and doesn't work well with large N.


matchPhraseis a subset ofanyWord, so there could be a number of words that aren't inmatchPhrasebeforematchPhrase, and they would be matched byanyWord. But because it is a subset, theanyWordmatch needs to be non-greedy otherwise thematchPhrasewords will be matched byanyWord. Hence why I can't doanyWord? anyWord? anyWord? matchPhrase. - Matt Swain