I'm trying to create a language using ANTLR where each line consists of an instruction, where an instruction is an opcode and any number of operands like so:
aaa "str1" "str2" 123
bbb 123 "str" 456
ccc
ddd
I have strings seemingly working OK, but integers seem to be parsed incorrectly.
Here's my complete grammar file:
grammar Insn;
prog: (line? NEWLINE)+;
line: instruction;
instruction: instruction_name instruction_operands?;
instruction_name: IDENTIFIER;
instruction_operands: instruction_operand instruction_operand*;
instruction_operand: ' '+ (operand_int | operand_string);
operand_int: INT;
operand_string: QSTRING;
NEWLINE : [\r\n]+;
IDENTIFIER: [a-zA-Z0-9_\-]+;
INT: '-'?[0-9]+;
QSTRING: '"' (~('"' | '\\' | '\r' | '\n') | '\\' ('"' | '\\'))* '"';
COMMENT: ';' ~[\r\n]* -> channel(HIDDEN);
I've tried multiple different INT definitions such as INT: '-'?('0'..'9')+; and INT: '2'; making all the INTs in the input 2, always resulting in an error similar to line 1:18 extraneous input '123' expecting {' ', INT, QSTRING}, with the line number, column and 123 integer replaced with whatever it was parsing.
Here's the parse tree generated by ANTLR's tooling as used in the ANTLR getting-started.md document.
I'm completely new to ANTLR and am not familiar with lots of terminology so please keep it simple for me.
INT: '-'?[0-9]+;may need an extra blank:INT: '-'? [0-9]+;- Dietmar Höhmann123is recognised asIDENTIFIER! Because it is a valid identifier (allINTs are). Both of them must be distinguishable.IDENTIFIERshould probably be something like thisIDENTIFIER: [a-zA-Z][a-zA-Z0-9_\-]*;- Dietmar HöhmannINTdefinition beforeIDENTIFIERand makinginstruction_name: INT | IDENTIFIER;which seems to work for me now, I forgot to mention the requirement to keep instruction_name to be valid as an integer too. If you'd like to post your comment as an answer I'll accept it as it does answer the question I asked originally. - Kirby Gaming