Is the recursion creating an actual, measured performance problem. If so, can you quantify the extent of recursion you are dealing with. Antlr is usually quite good at handling recursion, so if you are experiencing a real performance problem, it may be due to a deeper issue in Antlr. Ter and Sam will need to reproduce it in order to deal with it.
That said, every instance of expr
on the rule rhs will create a recursion. Grouping and limiting instances
expr : expr ('AND' | 'OR') expr
| 'a' | 'b' | 'c' | 'd'
;
won't change the number of recursions necessary to satisfy the rule -- that depends on the data being processed.
What might be a greater gain is, if your performance problem actually stems from a lot of deep recursion rule failures, is to restructure your grammar to fail the rule(s) sooner or limit the number of rules (or subrules) potentially applicable for any given sequence of data.
Exactly how is difficult to say based on the information provided so far.
Updated
To clarify, a recursive
rule call in Antlr is not implemented differently from any other rule call; it is not implemented by a recursive Java-method call.
The LL(*) algorithm of Antlr is implicitly a sequential path-solver operating against a precomputed valid-state network. Checkpoint information is preserved at each decision point, which includes rule calls, for backtracking. The checkpoint data necessary to capture a rule call state transition is relatively simple and not sensitive to whether ; the target node represents a grammar-based recursive
call or not. Performance largely correlates to the number of rule node evaluations, including in particular all those of any rule subpaths tried and eventually failed.
That is why unrolling a recursive
rule into several rules is unlikely to improve performance. If they that collectively implement the same recursive function, then at best, execution against the same input will require the same number of rule calls. At worst, the non-minimal expression of rules will require more rule calls and, likely, will incur more internal check points (every * and + parenthetical is implicitly guarded).
Assuming that your grammar is optimally minimized, and that the order of the 20+ expression rules is statistically optimized for your source data, you could could optimize further by only walking the exact subtree that will actually match the next sequence of source data.
Just statistically predict the correct subtree rule in advance. Given the number of times you are evaluating the parse tree, you should be able to fairly quickly correlate some simply computed signature of source data sequences with the subtrees eventually matched. Or at least decide a more appropriate ordering of subtrees to try.
Exactly what the signature function should be, or if the cost-benefit will be positive, is difficult to say based on the information provided so far. Really depends on the nature of your source data.