0
votes

I have a simple lexer/grammar I've been working on and I'm having trouble understanding the standard operating procedure for matching formatted variables. I am trying to match the following:

  1. Variable name can be 1 character minimum. If it is one char, it must be an uppercase or lowercase letter.
  2. If it is greater than 1 character, it must begin with a letter of any case, and then be followed by any number of characters, including numbers, underscore and the dollar sign.

I've rewritten this several times, in many flavors, and I always get the following error:

Decision can match input such as "SINGLELETTER" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input"

Would really appreciate some insight. I understand there is some ambiguity in my grammar, but I am a bit confused why multiple alternatives can be matched, once we enter the original matching loop. Thank you!

variablename 
    :   (SINGLELETTER)
    |   (SINGLELETTER|UNDERSCORE)( SINGLELETTER|UNDERSCORE | DOLLAR | NUMBER)*;

SINGLELETTER    :   ( 'a'..'z' | 'A'..'Z');


fragment LOWERCASE  :   'a'..'z';
fragment UNDERSCORE :   '_';
fragment DOLLAR :   '$';  
fragment NUMBER :   '0'..'9';
1

1 Answers

0
votes

Why not make VariableName, a lexer rule which produces a single token for the entire name?

Variablename 
    :   SINGLELETTER
    |   (SINGLELETTER|UNDERSCORE) (SINGLELETTER | UNDERSCORE | DOLLAR | NUMBER)*;

fragment SINGLELETTER   :   ( 'a'..'z' | 'A'..'Z');


fragment LOWERCASE  :   'a'..'z';
fragment UNDERSCORE :   '_';
fragment DOLLAR :   '$';  
fragment NUMBER :   '0'..'9';

Also, the way you wrote variableName does not follow point #2 you wrote (the grammar allows the variable to start with _, but you didn't allow that in your explanation).