ANTLR grammar: parser- and lexer literals

9

votes

What's the difference between this grammar:

...
if_statement : 'if' condition 'then' statement 'else' statement 'end_if';
...

and this:

...
if_statement : IF condition THEN statement ELSE statement END_IF;
...

IF : 'if';
THEN: 'then';
ELSE: 'else';
END_IF: 'end_if';
....

?

If there is any difference, as this impacts on performance ... Thanks

parsingantlrtokenlexerantlr3

9

votes

In addition to Will's answer, it's best to define your lexer tokens explicitly (in your lexer grammar). In case you're mixing them in your parser grammar, it's not always clear in what order the tokens are tokenized by the lexer. When defining them explicitly, they're always tokenized in the order they've been put in the lexer grammar (from top to bottom).

2

votes

The biggest difference is one that may not matter to you. If your Lexer rules are in the lexer then you can use inheritance to have multiple lexer's share a common set of lexical rules.

If you just use strings in your parser rules then you can not do this. If you never plan to reuse your lexer grammar then this advantage doesn't matter.

Additionally I, and I'm guessing most Antlr veterans, are more accustom to finding the lexer rules in the actual lexer grammar rather than mixed in with the parser grammar, so one could argue the readability is increased by putting the rules in the lexer.

There is no runtime performance impact after the Antlr parser has been built to either approach.

1

votes

The only difference is that in your first production rule, the keyword tokens are defined implicitly. There is no run-time performance implication for tokens defined implicitly vs. explicitly.

0

votes

Yet another difference: when you explicitly define your lexer rules you can access them via the name you gave them (e.g. when you need to check for a specific token type). Otherwise ANTLR will use arbitrary numbers (with a prefix).

ANTLR grammar: parser- and lexer literals

4 Answers