0
votes

How can I recognize different tokens for the same symbol in ANTLR v4? For example, in selected = $("library[title='compiler'] isbn"); the first = is an assignment, whereas the second = is an operator.

Here are the relevant lexer rules:

EQUALS
:
    '='
;

OP
:
    '|='
    | '*='
    | '~='
    | '$='
    | '='
    | '!='
    | '^='
;

And here is the parser rule for that line:

assign
:
    ID EQUALS DOLLAR OPEN_PARENTHESIS QUOTES ID selector ID QUOTES
    CLOSE_PARENTHESIS SEMICOLON
;

selector
:
    OPEN_BRACKET ID OP APOSTROPHE ID APOSTROPHE CLOSE_BRACKET
;

This correctly parses the line, as long as I use an OP different than =.

Here is the error log:

JjQueryParser::init:34:29: mismatched input '=' expecting OP
JjQueryParser::init:34:39: mismatched input ''' expecting '\"'
JjQueryParser::init:34:46: mismatched input '"' expecting '='
2
Without seeing the rest of your lexer rules (and the specific order of them) and what your specific error is, I can't tell you what the problem is. But from what you've given us so far, I don't see the use of a non-greedy operator. That's recommended way for matching patterns of text inside quotes.MacGyver
@MacGyver I've appended the error log. The goal is to unify with one of the strings, should I do it differently?Henrique Ferrolho
Is there some rule that comes before the OP lexer rule that is matching on '='? That's usually what this "mismatched input" error means.MacGyver
only the EQUALS token @MacGyverHenrique Ferrolho
Besides your problem: From another thread i know that you already embedded these expressions into a java program. Your jquery expressions are a subset of javascript and javascript would parse a simple string literal instead of QUOTES ID selector ID QUOTES, this string is parse in a second phase at runtime. I think some things get easier if you delay the string parsing to the semantic analysis.CoronA

2 Answers

1
votes

The problem cannot be solved in the lexer, since the lexer does always return one token type for the same string. But it would be quite easy to resolve it in the parser. Just rewrite the rules lower case:

equals
: '='
;
op
:'|='
| '*='
| '~='
| '$='
| '='
| '!='
| '^='
;
1
votes

I had the same issue. Resolved in the lexer as follows:

EQUALS: '=';
OP    : '|' EQUALS
      | '*' EQUALS
      | '~' EQUALS
      | '$' EQUALS
      | '!' EQUALS
      | '^' EQUALS
      ;

This guarantees that the symbol '=' is represented by a single token all the way. Don't forget to update the relevant rule as follows:

selector
:
OPEN_BRACKET ID (OP|EQUALS) APOSTROPHE ID APOSTROPHE CLOSE_BRACKET
;