Using different case keywords in ANTLR grammar

Question

Some keywords (string constant) in my grammar contain capital letters e.g.

PREV_VALUE : 'PreviousValue';

This causes strange parsing behavior: other tokens that contain same capital letters ('P','V') are parsed incorrectly.

Here's a simplified version of the lexer grammar:

lexer grammar ExpressionLexer;

COMMA : ',';
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
PREV_VALUE : 'PreviousValue';
fragment DIGIT : ('0'..'9');
fragment LETTER : ('a'..'z'|'A'..'Z'|'_');
fragment TAB : ('\t') ;
fragment NEWLINE : ('\r'|'\n') ;
fragment SPACE : (' ') ;

When I try parsing such expression:

var expression = "P"; //Capital 'P' which included to the keyword 'PreviousValue'
var stringReader = new StringReader(expression);
var input = new ANTLRReaderStream(stringReader);
var expressionLexer = new ExpressionLexer(input);
var tokens = new CommonTokenStream(expressionLexer);

tokens._tokens collection contains one value

[0] = {[@0,1:1='<EOF>',<-1>,1:1]}

It's incorrect.

If I change expression to 'p' (lowercase letter) tokens._tokens collection contains two values

[0] = {[@0,0:0='p',<0>,1:0]}
[1] = {[@1,1:1='<EOF>',<-1>,1:1]}

It's correct.

When string PREV_VALUE : 'PreviousValue'; is removed from grammar, both expressions are parsed correctly.

Is it possible to use different case in keywords? Is there any example of using such keywords in ANTLR grammar?

Sorry for the confusion I've edited my initial post. Hope that clears it up. — Villa F.

Bart Kiers Bart Kiers · Accepted Answer · 2012-01-14T08:57:40

I find it hard to believe a p token is created based on the grammar you posted. Lexer rules that have fragment in front of them will not produce tokens: these rules are only used by other lexer rules.

A simple demo shows this:

lexer grammar ExpressionLexer;

@lexer::members {
  public static void main(String[] args) throws Exception {
    ExpressionLexer lexer = new ExpressionLexer(new ANTLRStringStream(args[0]));
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    tokens.fill(); // remove this line when using ANTLR 3.2 or an older version
    System.out.println(tokens);
  }
}

COMMA : ',';
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
PREV_VALUE : 'PreviousValue';
fragment DIGIT : ('0'..'9');
fragment LETTER : ('a'..'z'|'A'..'Z'|'_');
fragment TAB : ('\t') ;
fragment NEWLINE : ('\r'|'\n') ;
fragment SPACE : (' ') ;

Now generate the lexer and compile the .java source file:

java -cp antlr-3.3.jar org.antlr.Tool ExpressionLexer.g 
javac -cp antlr-3.3.jar *.java

and run a few tests:

java -cp .:antlr-3.3.jar ExpressionLexer p
line 1:0 no viable alternative at character 'p'

which is correct since there is no (non-fragment) rule that starts with, or matches, a "p".

java -cp .:antlr-3.3.jar ExpressionLexer P
line 1:1 mismatched character '' expecting 'r'

which is correct since the only (non-fragment) rule that starts with a "P" expects an "r" to be the next character (which isn't there).

Using different case keywords in ANTLR grammar

1 Answers