1
votes

I have a syntax where I need to match given the following example:

some-Text->more-Text

From this example, I need ANTLR4 lexer rules that would match 'some-Text' and 'more-Text' into one lexer rule, and the '->' as another rule.

I am using the lexer rules shown below as my starting point, but the trouble is, the '-' character is allowed in the NAMEDELEMENT rule, which causes the first NAMEDELEMENT match to become 'some-Text-', which then causes the '->' to not be captured by the EDGE rule.

I'm looking for a way to ensure that the '-' is not captured as the last character in the NAMEDELEMENT rule (or some other alternative that produces the desired result).

EDGE
    :   '->'
    ;

NAMEDELEMENT  
    :   ('a'..'z'|'A'..'Z'|'_'|'@') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'-')* { _input.LA(1) != '-' && _input.LA(2) != '>' }?
    ;

Im trying to use the predicate above to look ahead for a sequence of '-' and '>', but it doesn't seem to work. It doesn't seem to do anything at all, actually, as get the same parsing results both with and without the predicate.

The parser rules are as follows, where I am matching on 'selector' rules:

selector
    :   namedelement (edge namedelement)*
    ;

edge
    :   EDGE
    ;

namedelement
    :   NAMEDELEMENT
    ;

Thanks in advance!

2

2 Answers

0
votes

After messing around with this for hours, I have a syntax that works, though I fail to see how it is functionally any different than what I posted in the original question.

(I use the uncommented version so that I can put a break point in the generated lexer to ensure that the equality test is evaluating correctly.)

NAMEDELEMENT  
    //: [a-zA-Z_@] [a-zA-Z_-]* { String.fromCharCode(this._input.LA(1)) != ">" }? 
    : [a-zA-Z_@] [a-zA-Z_-]* { (function(a){
            var c = String.fromCharCode(a._input.LA(1));
            return c != ">";
        })(this)
    }? 
    ;

My target language is JavaScript and both the commented and uncommented forms of the predicate work fine.

0
votes

Try this:

NAMEDELEMENT
 : [a-zA-Z_@] ( '-' {_input.LA(1) != '>'}? | [a-zA-Z0-9_] )*
 ;

Not sure if _input.LA(1) != '>' is OK with the JavaScript runtime, but in Java it properly tokenises "some-->more" into "some-", "->" and "more".