I have a grammar that works fine when parsing in one pass (entire file).
Now I wish to break the parsing up into components. And run the parser on subrules. I ran into an issue I assume others parsing subrules will see with the following rule:
thing : LABEL? THING THINGDATA thingClause?
//{System.out.println("G4 Lexer/parser thing encountered");}
;
...
thingClause : ',' ID ( ',' ID)?
;
When the above rule is parsed from a top level start rule which parses to EOF everything works fine. When parsed as a sub-rule (not parse to EOF) the parser gets upset when there is no thing clause, as it is expecting to see EITHER a "," character or an EOF character.
line 8:0 mismatched input '%' expecting {, ','}
When I parse to EOF, the % gets correctly parsed into another "thing" component, because the top level rule looks for:
toprule : thing+
| endOfThingsTokens
;
And endOfThingsTokens occurs before EOF... so I expect this is why the top level rule works.
For parsing the subrule, I want the ANTLR4 parser to accept or ignore the % token and say "OK we aren't seeing a thingClause", then reset the token stream so the next thing object can be parsed by a different instance of the parser.
In this specific case I could change the lexer to pass newlines to the parser, which I currently skip in the lexer grammar. That would require lots of other changes to accept newlines in the token stream which are currently not needed.
Essentially I need some way to make the rule have a "end of record" token. But I was wondering if there was some way to solve this with a semantic predicate rule.
something like:
thing : { if comma before %}? LABEL? THING THINGDATA thingClause?
| LABEL? THING THINGDATA
;
...
thingClause : ',' ID ( ',' ID)?
;
The above predicate pseudo code would hide the optional thingClause? if it won't be satisfied so that the parser would stop after parsing one "thing" without looking for a specific "end of thing" token (i.e. newline).
If I solve this I will post the answer.