ANTLR4 Grammar - Issue with "dot" in fields and extended expressions

Question

I have the following ANTLR4 Grammar

grammar ExpressionGrammar;

parse: (expr)
     ;

expr: MIN expr
    | expr ( MUL | DIV ) expr
    | expr ( ADD | MIN ) expr
    | NUM
    | function
    | '(' expr ')'
    ;

function : ID '(' arguments? ')';

arguments: expr ( ',' expr)*;

/* Tokens */

MUL : '*';
DIV : '/';
MIN : '-';
ADD : '+';
OPEN_PAR : '(' ;
CLOSE_PAR : ')' ;

NUM : '0' | [1-9][0-9]*;
ID : [a-zA-Z_] [a-zA-Z]*;
COMMENT: '//' ~[\r\n]* -> skip;
WS: [ \t\n]+ -> skip;

I have an input expression like this :-

(Fields.V1)*(Fields.V2) + (Constants.Value1)*(Constants.Value2)

The ANTLR parser generated the following text from the grammar above :-

(FieldsV1)*(FieldsV2)+(Constants<missing ')'>

As you can see, the "dots" in Fields.V1 and Fields.V2 are missing from the text and also there is a <missing ')' Error node. I believe I should somehow make ANTLR understand that an expression can also have fields with dot operators.

A question on top of this :-

 (Var1)(Var2)

ANTLR is not throwing me error for this above scenario , the expressions should not be (Var1)(Var2) -- It should always have the operator (var1)*(var2) or (var1)+(var2) etc. The parser error tree is not generating this error. How should the grammar be modified to make sure even this scenario is taken into consideration.

Start by adding EOF to your parse rule (and remove the unnecessary parentheses). — Mike Lischke
I'm not sure what you expect to happen: I don't see any rule in your lexer that matches a .... — Bart Kiers
@BartKiers That's right. I want to know where exactly should I add that matching. Because my parser shouldn't actually split the "dot" . It should be as a whole "Fields.V1" — Veryon890
@MikeLischke EOF - Are you referring to the first parse: (expr) ? You want me to change it to expr EOF ? — Veryon890
"It should be as a whole "Fields.V1"" then you should edit your ID rule to make it also include the .. But I'm confused: is this your own grammar, or did you find it somewhere? I get the impression you're blindly trying thing without really understanding ANTLR. Perhaps take a step back and start with a basic ANTLR tutorial? — Bart Kiers

Mike Cargal Mike Cargal · Accepted Answer · 2021-02-02T14:37:51

To recognize IDs like Fields.V1, change you Lexer rule for ID to something like this:

fragment ID_NODE: [a-zA-Z_][a-zA-Z0-9]*;
ID: ID_NODE ('.' ID_NODE)*;

Notice, since each "node" of the ID follows the same rule, I made it a lexer fragment that I could use to compose the ID rule. I also added 0-9 to the second part of the fragment, since it appears that you want to allow numbers in IDs

Then the ID rule uses the fragment to build out the Lexer rule that allows for dots in the ID.

You also didn't add ID as a valid expr alternative

To handle detection of the error condition in (Var1)(Var2), you need Mike's advice to add the EOF Lexer rule to the end of the parse parser rule. Without the EOF, ANTLR will stop parsing as soon as it reaches the end of a recognized expr ((Var1)). The EOF says "and then you need to find an EOF", so ANTLR will continue parsing into the (Var2) and give you the error.

A revised version that handles both of your examples:

grammar ExpressionGrammar;

parse: expr EOF;

expr:
    MIN expr
    | expr ( MUL | DIV) expr
    | expr ( ADD | MIN) expr
    | NUM
    | ID
    | function
    | '(' expr ')';

function: ID '(' arguments? ')';

arguments: expr ( ',' expr)*;

/* Tokens */

MUL: '*';
DIV: '/';
MIN: '-';
ADD: '+';
OPEN_PAR: '(';
CLOSE_PAR: ')';

NUM: '0' | [1-9][0-9]*;
fragment ID_NODE: [a-zA-Z_][a-zA-Z0-9]*;
ID: ID_NODE ('.' ID_NODE)*;
COMMENT: '//' ~[\r\n]* -> skip;
WS: [ \t\n]+ -> skip;

(Now that I've read through the comments, this is pretty much just applying the suggestions in the comments)

ANTLR4 Grammar - Issue with "dot" in fields and extended expressions

1 Answers