I'm building an ANTLR4 grammar to parse strings from a data source - similar, if not pretty much the same as StringTemplate, except I don't like that syntax so I'm writing my own (also just for fun and learning, as this is my first experience w/ ANTLR). My grammar currently looks like this (this is simplified from what I actually have, but I've verified that it is a "good example" and exhibits the same problem I'm asking about):
grammar Combined1;
file:
.*? (repToken .*?)+
| .*?
;
foreach: '@foreach' WS* '(' WS* repvar WS* ')' WS* '{' content=file '}' ;
with: '@with' WS* '(' WS* repvar WS* ')' WS* '{' content=file '}' ;
// withx: '@withx' WS* '(' WS* repvar WS* ')' WS* '{' content=file '}' ;
repvar: '@' (
'$'
| '(' nestedIdentifier ')'
| nestedIdentifier
) ;
repToken:
foreach
| with
// | withx
| repvar
;
nestedIdentifier: Identifier ('.' Identifier)* ;
Identifier: [A-Za-z_] [A-Za-z0-9_]* ;
WS: [ \t\r\n] ;
Other: ( . ) ;
This grammar works just fine, allowing me to perform replacements such as:
string template = "Test: @foreach(@list){@$}";
Process(template, new { list = new [] { "A", "B", "C" } });
and the result would be:
Test: ABC
(The mechanics of how I process the tree to get this result are relatively simple but not relevant to the question, so I'm not providing that code.)
My question is this... if I include (uncomment) the "withx" rule right below the with: rule, and I forget to include (uncomment) the withx to the alternatives in repToken then my example above breaks, even though it has absolutely nothing to do with withx. Once I add withx as an alternative to repToken then my example works again. Why??
Here's what I know:
- Regardless of whether
withxis included or not, my lexer correctly returns 12 tokens:Test,:,' ',@foreach,(,@,list,),{,@,item. This isn't surprising as I've only added a parser rule, and not touched the lexer tokens (aside from adding the one implicit token '@withx'). - Before I add the
withxrule, my parser correctly groups all the tokens after @foreach as children of the ForeachContext, resulting in a FileContext with 4 children (3 TerminalNodeImpl and a RepTokenContext). - After I add the
withxrule, my parser for some reason doesn't recognize the rest of the tokens as belonging to ForeachContext, resulting in a FileContext with 10 children, none of which is a ForeachContext, but which has all TerminalNodeImpl with 2 RepTokenContext corresponding to @list and @$.
I'm completely baffled why adding a parser rule that doesn't have anything to do with my input would cause my parser to fail. Help!?
EDIT 3/17/2014: JavaMan asked for a parse tree in each scenario to clarify the description above. I don't know how to generate the parse tree graphic that he did, but here's two screenshots from Visual Studio debugger illustrating the difference... Note that in these images I use longer names - specifically, ReplacementTokenContext is for repToken.
The first one is when I DO include withx in the alternative list (note that the tree is essentially FileContext -> ReplacementTokenContext (node index 3) -> ForeachContext):

And the second is when I DO NOT include withx in the alternative list (note that the tree is essentially FileContext -> TerminalNodeImpl "@foreach" (node index 3):


.*?in your parser rules for performance reasons. - Sam Harwell