I'm starting exploring ANTLR and I'm trying to match this format: (test123 A0020 )
Where :
- test123 is an Identifier of max 10 characters ( letters and digits )
- A : Time indicator ( for Am or Pm ), one letter can be either "A" or "P"
- 0020 : 4 digit format representing the time.
I tried this grammar :
IDENTIFIER
:
( LETTER | DIGIT ) +
;
INT
:
DIGIT+
;
fragment
DIGIT
:
[0-9]
;
fragment
LETTER
:
[A-Z]
;
WS : [ \t\r\n(\s)+]+ -> channel(HIDDEN) ;
formatter: '(' information ')';
information :
information '/' 'A' INT
|IDENTIFIER ;
How can I resolve the ambiguity and get the time format matched as 'A' INT not as IDENTIFIER? Also how can I add checks like length of token to the identifier? I tknow that this doesn't work in ANTLR : IDENTIFIER : (DIGIT | LETTER ) {2,10}
UPDATE:
I changed the rules to have semantic checks but I still have the same ambiguity between the identifier and the Time format. here's the modified rules:
formatter
: information
| information '-' time
;
time :
timeMode timeCode;
timeMode:
{ getCurrentToken().getText().matches("[A,C]")}? MOD
;
timeCode: {getCurrentToken().getText().matches("[0-9]{4}")}? INT;
information: {getCurrentToken().getText().length() <= 10 }? IDENTIFIER;
MOD: 'A' | 'C';
So the problem is illustrated in the production tree, A0023 is matched to timeMode and the parser is complaining that the timeCode is missing
IDENTIFIER: (LETTER | DIGIT) (LETTER | DIGIT) ...
ten times. – MephyA0023
as a single TIME token? – Bart KiersP123
,P12345
,P
. Correct? – Bart Kiers