1
votes

I've migrated a quite large ANTLR2 grammar to ANTLR4, and reached a step where the output in both grammars is almost identical, apart from a few edge cases. However, some files are extremely long to get parsed (even with SLL prediction mode and BailOutStrategy), so I'm wondering how I could find which rules should be fixed first.

I've already gathered some statistics with Parser#setProfile(), but I don't know how to interpret the results in each DecisionInfo object. Is there any good documentation on how to start optimizing a large ANTLR4 grammar, and find which rabbit to chase first ?

1
If you are an IntelliJ user, you might want to use the antlr plugin: plugins.jetbrains.com/plugin/7358-antlr-v4-grammar-plugin . It has a built-in profiler.Clashsoft
I'll give this plugin another try, as the visualization for ambiguities and lookaheads looks very interesting. But I had issues last time with this specific grammar where the lexer is completely separated from the parser.Gilles QUERRET
Hmm, I've used the plugin with separate lexer and parser before and I didn't notice any problems. Hope it works for you!Clashsoft

1 Answers

3
votes

As I didn't know what to look for in the DecisionInfo object, here's what I've found, and helped me improve the parse time by at least an order of magnitude.

First, I enabled profiling on the grammar with org.antlr.v4.runtime.Parser.setProfile(boolean profile), then executed the parser with org.antlr.v4.runtime.Parser.getInterpreter().setPredictionMode(PredictionMode.SLL) on thousands of files, and browsed through the rules with highest prediction time:

Arrays.stream(parser.getParseInfo().getDecisionInfo())
          .filter(decision -> decision.timeInPrediction > 100000000)
          .sorted((d1, d2) -> Long.compare(d2.timeInPrediction, d1.timeInPrediction))
          .forEach(decision -> System.out.println(
                String.format("Time: %d in %d calls - LL_Lookaheads: %d Max k: %d Ambiguities: %d Errors: %d Rule: %s",
                    decision.timeInPrediction / 1000000,
                    decision.invocations, decision.SLL_TotalLook,
                    decision.SLL_MaxLook, decision.ambiguities.size(), 
                    decision.errors.size(), Proparse.ruleNames[Proparse._ATN.getDecisionState(decision.decision).ruleIndex])))

and then on highest max lookahead by using the same lamba except:

filter(decision -> decision.SLL_MaxLook > 50).sorted((d1, d2) -> Long.compare(d2.SLL_MaxLook, d1.SLL_MaxLook))

This gave me 4 rules where most of the time was spent, and in this case that was enough to see what had to be changed (by knowing where to look for problems).