1
votes

I am quiete a newbie in ANTLR and I looked around for a while to fix my problem. Unfortunately without any success...

I simplified my grammar to describe the problem (the token TAG is used in the real example):


grammar Test;

WORD : ('a'..'z')+;

DOT : '.';

TAG : '.test';

WHITE_SPACE
    :   (' '|'\t'|'\n'|'\r')+ {$channel = HIDDEN;};


rule
    :   'a' DOT WORD 'z';

When I try to parse the word "a .bcd z" everything is fine, but when I try the word "a .tbyfa z" it shows me the error

line 1:4 mismatched character 'b' expecting 'e'
line 1:5 missing DOT at 'yfa'

In my opinion the problem is that the string after the "." starts with a "t" which could also be the token ".test". I tried backtrack=true, but also without any success.


How can I fix that problem?
Thanks in advance.

1
".test" is a keyword. the dot in ".test" has a different meaning than the DOT token.user1286372

1 Answers

2
votes

ANTLR's lexer cannot backtrack to an alternative in this case. Once the lexer sees ".t", it tries to match the TAG token, but this doesn't succeed, so the lexer then tries to match something else that starts with ".t", but there is no such token. And the lexer will not backtrack a character again to match a DOT. So that's what's going wrong.

A possible solution to it would be to do it like this:

grammar Test;

rule  : 'a' DOT WORD 'z';
WORD  : ('a'..'z')+;
DOT   : '.' (('test')=> 'test' {$type=TAG;})?;
SPACE :  (' '|'\t'|'\n'|'\r')+ {$channel = HIDDEN;};

fragment TAG : /* empty rule: only used to change the 'type' */;

The ('test')=> is a syntactic predicate which forces the lexer to look ahead to see if there really is "test" ahead. If this is true, "test" is matched and the type of the token is changed to TAG. And since 'test' is optional, the rule can always fall back on only the DOT token.