0
votes

I'm trying to use ANTLR4 to parse a file, where elements can be the character "b" or simple literals, the problem appears when the Literal is just one character with a "b".

Here's a simplified grammar

Lexer file:

B
    : 'b'
    ;

LETTER
    : [a-z]
    ;

LETTERS
    : LETTER+
    ;

Parser file:

pointer
    : B '.' LETTERS
    ;

b.f works but b.b doesn't, I get "line 1:2 mismatched input 'b' expecting LETTERS". How can I avoid the conflict between the two lexical rules without putting Letter above B, where the problem will just change to B.

1
Why would it expect LETTERS at column 0? Also why does it say Letters with only a capital L? Are you sure you're running the same grammar you posted here?sepp2k
Hi, I was testing with another grammar, this was the simplified version. I've just tested with this one and modified the question. Please notice the line "the problem appears when the Literal is just one character with a "b"".moe

1 Answers

1
votes

First note that the problem isn't just going to occur with b, but with any single letter. Letters other than b would simply be matched by the LETTER rule, which is still not the same as LETTERS. Since you never actually use LETTER, you can solve that part of the problem by simply removing LETTER from the grammar altogether.

As far as B is concerned, this is what's known as a contextual keyword: something that matches the rule for an identifier (or a LETTERS in this case), should be treated specially in some positions, but still be allowed as an identifier in other positions. The common way to implement contextual keywords is to define a non-terminal for identifiers that can either match an actual identifiers or any of the language's contextual keywords. So in your case, you could do this:

letters: LETTERS | B; // You can add "| LETTER" if you want to keep LETTER
pointer: B '.' letters;