I'm trying to create a (simple) Lexer for bat/cmd files (for syntax coloring). As part of this task, I need to separate keywords from (arbitrary) identifiers. But according to this answer ANTLR tries to let the longest match win over shorter ones. My grammar looks like this so far
lexer grammar CmdLexer;
Identifier
: IdentifierNonDigit
( IdentifierNonDigit
| Digit
)+
;
Number
: Digit+
;
fragment IdentifierNonDigit
: [a-zA-Z_\u0080-\uffff]
;
fragment Digit
: [0-9]
;
Punctuation
: [\u0021-\u002f\u003a-\u0040\u005b-\u0060\u007b-\u007f]+
;
Keyword
: A P P E N D
| A T
| A T T R I B
| B R E A K
| C A L L
| C D
| C H C P
| C H D I R
| C L S
| C O L O R
| C O P Y
| D A T E
| D E L
| D I R
| D O
| E C H O
| E D I T
| E N D L O C A L
| E Q U
| E X I S T
| E X I T
| F C
| F O R
| F T Y P E
| G O T O
| G E Q
| G T R
| I F
| I N
| L E Q
| L S S
| M D
| M K D I R
| M K L I N K
| M O R E
| M O V E
| N E Q
| N O T
| N U L
| P A T H
| P A U S E
| P O P D
| P U S H D
| R D
| R E N
| R E N A M E
| S E T
| S E T L O C A L
| S H I F T
| S T A R T
| T I T L E
| T R E E
| T Y P E
| W H E R E
| W H O A M I
| X C O P Y
;
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
LineComment
: ( '@'? R E M ~[\r\n]*
| '@'? '::' ~[\r\n]*
)
-> skip
;
but it always matches everything as Identifier
, even words like append
or CALL
. I don't see how modes would solve this problem here, but how to give a certain rule higher priority (here Keyword
over another (here Identifier
)?