1
votes

I have a problem with antlr4 grammar in java.

I would like to have a lexer value, that is able to parse all of the following inputs:

  • Only letters
  • Letters and numbers
  • Only numbers

My code looks like this:

parser rule:

new_string: NEW_STRING+;

lexer rule:

NEW_DIGIT: [0-9]+;
STRING_CHAR : ~[;\r\n"'];
NEW_STRING: (NEW_DIGIT+ | STRING_CHAR+ | STRING_CHAR+ NEW_DIGIT+);

I know there must be an obvious solution, but I have been trying to find one, and I can't seem to figure out a way.

Thank you in advance!

1

1 Answers

1
votes

Since the first two lexer rules are not fragments, they can (and will) be matched if the input contains just digits, or ~[;\r\n"'] (since if equally long sequence of input can be matched, first lexer rule wins).

In fact, STRING_CHAR can match anything that NEW_STRING can, so the latter will never be used.

You need to:

  • make sure STRING_CHAR does not match digits
  • make NEW_DIGIT and STRING_CHAR fragments
  • check the asterisks - almost everything is allowed to repeat in your lexer, it doesn't make sense at first look ( but you need to adjust that according to your requirements that we do not know)

Like this:

fragment NEW_DIGIT: [0-9];
fragment STRING_CHAR : ~[;\r\n"'0-9];
NEW_STRING: (NEW_DIGIT+ | STRING_CHAR+ (NEW_DIGIT+)?);