ANTLR C# 4 Grammar - Precedence

Question

I have the grammar below (simplified for demonstration) and I'm having a problem in a particular case related to logical operators.

Everything I've test works except the case a logical operator is within my quoted identifier. For example this works:

@M = "ABC12345"

But this does not:

@M = "ABC12OR345"

What happens is the OR inside the string gives causes the following error

extraneous input 'OR' expecting {'"", LOWCHAR< HIGHCAR, DIGIT}

I'm at a loss as to how to get the precedence correct.

Thanks

            grammar PRDL;

            options
            {
                    language=CSharp;
            }

            statement
                : expression ( logicalOperator expression )*
                ;

            logicalOperator
                : logicalOR | logicalAND
                ;

            logicalOR
                : OR
                ;

            logicalAND
                : AND
                ;

            expression
                : mVar
                | nVar
                | parenStatement
                | notExpression
                ;

            parenStatement
                : LPAREN statement RPAREN
                ;

            notExpression
                : NOT expression
                ;

            mVar
                : M equalityOperator quotedIdentifier
                ;

            nVar
                : N equalityOperator quotedIdentifier
                ;

            equalityOperator
                    : EQUAL
                    ;

            quotedIdentifier
                : '"' identifier '"'
                ;

            identifier
                : (HIGHCHAR | LOWCHAR | DIGIT)+
                ;


            // ============  Lexer Defintions  ========================

            // OPERATORS

            NOT_ALLOWED : '*' | '/' | '+' | '-' | '#' | '$' | '%' | '^';

            EQUAL       : '=';

            COMMA       : ',';

            LPAREN      : '(';

            RPAREN      : ')';

            LPARENSQ    : '[';

            RPARENSQ    : ']';

            OR          : ('OR' | 'or' | '||');

            AND         : ('AND' | 'and' | '&&');

            NOT         : ('NOT' | 'not' | '!') ;

            M               : '@M';

            N               : '@N';

            LOWCHAR     : 'a'..'z';

            HIGHCHAR    : 'A'..'Z';

            DIGIT       : '0'..'9';


            // Whitespace -- ignored
            WS          : [ \n\t\r\f]+ -> skip;

Lucas Trzesniewski Lucas Trzesniewski · Accepted Answer · 2015-03-29T00:30:48

You put the bar between lexer and parser in the wrong place...

quotedIdentifier
    : '"' identifier '"'
    ;

identifier
    : (HIGHCHAR | LOWCHAR | DIGIT)+
    ;

Right now, each and every letter becomes a token. That's not going to work well, like you can see because of the error you get.

These two parser rules should actually be lexer rules:

QUOTED_IDENTIFIER
    : '"' (HIGHCHAR | LOWCHAR | DIGIT)+ '"'
    ;

IDENTIFIER
    : (HIGHCHAR | LOWCHAR | DIGIT)+
    ;

And HIGHCHAR, LOWCHAR and DIGIT should be fragments to prevent getting a different token type on single characters:

fragment LOWCHAR     : 'a'..'z';
fragment HIGHCHAR    : 'A'..'Z';
fragment DIGIT       : '0'..'9';

With such a lexer, you'll get one token per identifier, which is much better for parsing.

Also, rules like these are pretty much useless:

equalityOperator
        : EQUAL
        ;

As it's just aliasing a lexer rule with a parser rule.

ANTLR C# 4 Grammar - Precedence

1 Answers