Let's say you're trying to parse the input "abc"
(without the quotes). Now your field
rule contains type identifier
, and type
can also match an identifier
. So you could say that the parser should be able to match identifier identifier
. But how should the input be "divided"? The parser could match "a"
to the first identifier
and "bc"
to the second identifier
. But it could also match "ab"
to the first, and "c"
to the second.
The fact that the parser can create more than one parse from a single input is the ambiguity in your grammar (the error message you encountered). And the cause of it is that you're trying to create identifiers at parse-time, while you should create them at lexer-time. So, if you create lexer tokens of identifier
instead of parser tokens, all should be okay.
And your lexer should not be creating ALPHA
, DIGIT
and LETTER
tokens. These rules should only be used by other lexer (so they should be marked as "fragment" rules).
Lastly, just like an identifier rule, you should make your number
rule a lexer rule instead of a parser rule (lexer rules start with a capital, parser rules with a lower case letter):
grammar twp3;
type : primitiveType | Identifier | 'any' 'defined' 'by' Identifier;
primitiveType : 'int' | 'string' | 'binary' | 'any';
field : 'optional'? type Identifier ';';
Identifier : LETTER (LETTER | DIGIT)*;
Number : DIGIT+;
fragment ALPHA : ('a'..'z'|'A'..'Z');
fragment DIGIT : ('0' .. '9');
fragment LETTER : ALPHA | '_';