2
votes

I have a very simple grammar that looks like this:

grammar Testing;

a :  d | b;
b : {_input.LT(1).equals("b")}? C;
d : {!_input.LT(1).equals("b")}? C;
C : .;

It parses one character from the input and checks whether the it's equal to the character b. If so, rule b is used, and if not, rule d is used.

However, the parse tree fails the expectation and parses everything using the first rule (rule d).

$ antlr Testing.g4
$ javac *.java
$ grun Testing a -trace                                                                                                                                                                                                                                     (base) 
c
enter   a, LT(1)=c
enter   d, LT(1)=c
consume [@0,0:0='c',<1>,1:0] rule d
exit    d, LT(1)=

exit    a, LT(1)=

$ grun Testing a -trace                                                                                                                                                                                                                                     (base) 
b
enter   a, LT(1)=b
enter   d, LT(1)=b
consume [@0,0:0='b',<1>,1:0] rule d
exit    d, LT(1)=

exit    a, LT(1)=

In both cases, rule d is used. However, since there is a guard on rule d, I expect rule d to fail when the first character is exactly 'b'.

Am I doing something wrong when using the semantic predicates?

(I need to use semantic predicates because I need to parse a language where keywords could be used as identifiers).

Reference: https://github.com/antlr/antlr4/blob/master/doc/predicates.md

1

1 Answers

3
votes

_input.LT(int) returns a Token, and Token.equals(String) will always return false. What you want to do is call getText() on the Token:

b : {_input.LT(1).getText().equals("b")}? C;
d : {!_input.LT(1).getText().equals("b")}? C;

However, often it is easier to handle keywords-as-identifiers in such a way:

rule
 : KEYWORD_1 identifier
 ;

identifier
 : IDENTIFIER
 | KEYWORD_1
 | KEYWORD_2
 | KEYWORD_3
 ;

KEYWORD_1 : 'k1';
KEYWORD_2 : 'k2';
KEYWORD_3 : 'k3';

IDENTIFIER : [a-zA-Z_] [a-zA-Z_0-9]*;