Starting point for ANTLR4 performance

Question

I've migrated a quite large ANTLR2 grammar to ANTLR4, and reached a step where the output in both grammars is almost identical, apart from a few edge cases. However, some files are extremely long to get parsed (even with SLL prediction mode and BailOutStrategy), so I'm wondering how I could find which rules should be fixed first.

I've already gathered some statistics with Parser#setProfile(), but I don't know how to interpret the results in each DecisionInfo object. Is there any good documentation on how to start optimizing a large ANTLR4 grammar, and find which rabbit to chase first ?

If you are an IntelliJ user, you might want to use the antlr plugin: plugins.jetbrains.com/plugin/7358-antlr-v4-grammar-plugin . It has a built-in profiler. — Clashsoft
I'll give this plugin another try, as the visualization for ambiguities and lookaheads looks very interesting. But I had issues last time with this specific grammar where the lexer is completely separated from the parser. — Gilles QUERRET
Hmm, I've used the plugin with separate lexer and parser before and I didn't notice any problems. Hope it works for you! — Clashsoft

Gilles QUERRET Gilles QUERRET · Accepted Answer · 2018-07-22T19:42:25

As I didn't know what to look for in the DecisionInfo object, here's what I've found, and helped me improve the parse time by at least an order of magnitude.

First, I enabled profiling on the grammar with org.antlr.v4.runtime.Parser.setProfile(boolean profile), then executed the parser with org.antlr.v4.runtime.Parser.getInterpreter().setPredictionMode(PredictionMode.SLL) on thousands of files, and browsed through the rules with highest prediction time:

Arrays.stream(parser.getParseInfo().getDecisionInfo())
          .filter(decision -> decision.timeInPrediction > 100000000)
          .sorted((d1, d2) -> Long.compare(d2.timeInPrediction, d1.timeInPrediction))
          .forEach(decision -> System.out.println(
                String.format("Time: %d in %d calls - LL_Lookaheads: %d Max k: %d Ambiguities: %d Errors: %d Rule: %s",
                    decision.timeInPrediction / 1000000,
                    decision.invocations, decision.SLL_TotalLook,
                    decision.SLL_MaxLook, decision.ambiguities.size(), 
                    decision.errors.size(), Proparse.ruleNames[Proparse._ATN.getDecisionState(decision.decision).ruleIndex])))

and then on highest max lookahead by using the same lamba except:

filter(decision -> decision.SLL_MaxLook > 50).sorted((d1, d2) -> Long.compare(d2.SLL_MaxLook, d1.SLL_MaxLook))

This gave me 4 rules where most of the time was spent, and in this case that was enough to see what had to be changed (by knowing where to look for problems).

Starting point for ANTLR4 performance

1 Answers