0
votes

Some keywords (string constant) in my grammar contain capital letters e.g.

PREV_VALUE : 'PreviousValue';

This causes strange parsing behavior: other tokens that contain same capital letters ('P','V') are parsed incorrectly.

Here's a simplified version of the lexer grammar:

lexer grammar ExpressionLexer;

COMMA : ',';
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
PREV_VALUE : 'PreviousValue';
fragment DIGIT : ('0'..'9');
fragment LETTER : ('a'..'z'|'A'..'Z'|'_');
fragment TAB : ('\t') ;
fragment NEWLINE : ('\r'|'\n') ;
fragment SPACE : (' ') ;

When I try parsing such expression:

var expression = "P"; //Capital 'P' which included to the keyword 'PreviousValue'
var stringReader = new StringReader(expression);
var input = new ANTLRReaderStream(stringReader);
var expressionLexer = new ExpressionLexer(input);
var tokens = new CommonTokenStream(expressionLexer);

tokens._tokens collection contains one value

[0] = {[@0,1:1='<EOF>',<-1>,1:1]}

It's incorrect.

If I change expression to 'p' (lowercase letter) tokens._tokens collection contains two values

[0] = {[@0,0:0='p',<0>,1:0]}
[1] = {[@1,1:1='<EOF>',<-1>,1:1]}

It's correct.

When string PREV_VALUE : 'PreviousValue'; is removed from grammar, both expressions are parsed correctly.

Is it possible to use different case in keywords? Is there any example of using such keywords in ANTLR grammar?

1
Sorry for the confusion I've edited my initial post. Hope that clears it up.Villa F.

1 Answers

1
votes

I find it hard to believe a p token is created based on the grammar you posted. Lexer rules that have fragment in front of them will not produce tokens: these rules are only used by other lexer rules.

A simple demo shows this:

lexer grammar ExpressionLexer;

@lexer::members {
  public static void main(String[] args) throws Exception {
    ExpressionLexer lexer = new ExpressionLexer(new ANTLRStringStream(args[0]));
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    tokens.fill(); // remove this line when using ANTLR 3.2 or an older version
    System.out.println(tokens);
  }
}

COMMA : ',';
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
PREV_VALUE : 'PreviousValue';
fragment DIGIT : ('0'..'9');
fragment LETTER : ('a'..'z'|'A'..'Z'|'_');
fragment TAB : ('\t') ;
fragment NEWLINE : ('\r'|'\n') ;
fragment SPACE : (' ') ;

Now generate the lexer and compile the .java source file:

java -cp antlr-3.3.jar org.antlr.Tool ExpressionLexer.g 
javac -cp antlr-3.3.jar *.java

and run a few tests:

java -cp .:antlr-3.3.jar ExpressionLexer p
line 1:0 no viable alternative at character 'p'

which is correct since there is no (non-fragment) rule that starts with, or matches, a "p".

java -cp .:antlr-3.3.jar ExpressionLexer P
line 1:1 mismatched character '' expecting 'r'

which is correct since the only (non-fragment) rule that starts with a "P" expects an "r" to be the next character (which isn't there).