3
votes

I have a lexer creates MACRO tokens for a dynamic list of macro strings passed to the lexer. I used a semantic predicate in the very top lexer rule to implement this feature:

MACRO: { macros != null && tryMacro() }? .;

Where tryMacro() just checks if any macro string matches the input sequence.

The performance of this approach was very bad and after some research I tried changing the lexer rule to the following:

MACRO: . { macros != null && tryMacro() }?;

This severely improved the performance but I don't really understand why. :) Since the '.' matches any character, the semantic predicate rule should be invoked exactly as many times as before, shouldn't it? Can someone provide an explanation for this behavior?

2
This problem is well known. I described in on my ANLTR4 C++ target blog post: soft-gems.net/index.php/tools/49-the-antlr4-c-target-is-here. Predicates on the left hand side of a rule prevent the DFA generated for it not to be cached, but cause them to be evaluated again and again. This is especially bad if you have many alts, e.g. many keywords (lexer rules are kinda alts of a hidden root context).Mike Lischke
@MikeLischke your link is broken :(Ivan Kochurkin
I moved the site. The link has changed to: soft-gems.net/the-antlr4-c-target-is-hereMike Lischke

2 Answers

2
votes

The reason is pretty simple: if you put the predicate at the start, the lexer will evaluate it to decide if the MACRO rule should apply. If you put it at the end, it will only perform the check when it has a potential match for the MACRO rule.

Since MACRO is very generic, I suppose you put it at the end of the rules, and due to the priority rules it will surely get tried last. It can match only single character tokens, so more precise rules will be prioritary.

If the MACRO rule is superseded by a more prioritary rule, it won't be considered and your predicate won't be invoked.

1
votes

I debugged this a bit further and it turned out that the reordering of the rule changed the behavior of the lexer causing macros to not be accepted during parsing. The reason for the perceived increase in performance was because the semantic predicate was only evaluated a couple of times before the lexer dropped the rule while doing its predictions. So the change of the rule was actually invalid and not a performance improvement.

I finally solved the performance issue by moving the macro handling to the parser.