ANTLR4 lexer rule ensuring expression does not end with character

Question

I have a syntax where I need to match given the following example:

some-Text->more-Text

From this example, I need ANTLR4 lexer rules that would match 'some-Text' and 'more-Text' into one lexer rule, and the '->' as another rule.

I am using the lexer rules shown below as my starting point, but the trouble is, the '-' character is allowed in the NAMEDELEMENT rule, which causes the first NAMEDELEMENT match to become 'some-Text-', which then causes the '->' to not be captured by the EDGE rule.

I'm looking for a way to ensure that the '-' is not captured as the last character in the NAMEDELEMENT rule (or some other alternative that produces the desired result).

EDGE
    :   '->'
    ;

NAMEDELEMENT  
    :   ('a'..'z'|'A'..'Z'|'_'|'@') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'-')* { _input.LA(1) != '-' && _input.LA(2) != '>' }?
    ;

Im trying to use the predicate above to look ahead for a sequence of '-' and '>', but it doesn't seem to work. It doesn't seem to do anything at all, actually, as get the same parsing results both with and without the predicate.

The parser rules are as follows, where I am matching on 'selector' rules:

selector
    :   namedelement (edge namedelement)*
    ;

edge
    :   EDGE
    ;

namedelement
    :   NAMEDELEMENT
    ;

Thanks in advance!

Odinhaus Odinhaus · Accepted Answer · 2019-02-07T23:36:57

After messing around with this for hours, I have a syntax that works, though I fail to see how it is functionally any different than what I posted in the original question.

(I use the uncommented version so that I can put a break point in the generated lexer to ensure that the equality test is evaluating correctly.)

NAMEDELEMENT  
    //: [a-zA-Z_@] [a-zA-Z_-]* { String.fromCharCode(this._input.LA(1)) != ">" }? 
    : [a-zA-Z_@] [a-zA-Z_-]* { (function(a){
            var c = String.fromCharCode(a._input.LA(1));
            return c != ">";
        })(this)
    }? 
    ;

My target language is JavaScript and both the commented and uncommented forms of the predicate work fine.

ANTLR4 lexer rule ensuring expression does not end with character

2 Answers