1
votes

I would like to create an Antlr parser for custom language and decided to pick a simple calculator as an example. In my new grammar it should be possible to define a string, like this:

s = "Hello, I am a string"

and handle string interpolation. Text in double quotes enclosed in persent should be treated as interpolated, e.g.

s = "Hello, did you know that %2 + 2% is 4?"

Double percent sign should not be processed, e.g.

s = "He wants 50%% of this deal."

But at the same time my calculator should support modulus operation:

x = 5 % 2

So far, I was able to craft a Lexer/Grammar, which could switch mode and parse simple strings, here they are:

lexer grammar CalcLexer;

EQ: '=';
PLUS: '+';
MINUS: '-';
MULT: '*';
DIV: '/';

LPAREN : '(' ;
RPAREN : ')' ;

SINGLE_PERCENT_POP: '%' -> popMode;

ID  :   [a-zA-Z]+ ;
INT :   [0-9]+ ;

OPEN_DOUBLE_QUOTE: '"' -> pushMode(STRING_MODE);

NEWLINE:'\r'? '\n' ;
WS  :   [ \t]+ -> skip;


mode STRING_MODE;
DOUBLE_PERCENT: '%%';
SINGLE_PERCENT: '%' -> pushMode(DEFAULT_MODE);
TEXT: ~('%'|'\n'|'"')+;


CLOSE_DOUBLE_QUOTE: '"' -> popMode;

and

parser grammar CalcGrammar;

options { tokenVocab=CalcLexer; } // use tokens from CalcLexer.g4

prog:   stat+ ;

stat:   expr NEWLINE
    |   ID EQ (expr|text) NEWLINE
    |   NEWLINE
    ;

text: OPEN_DOUBLE_QUOTE content* CLOSE_DOUBLE_QUOTE;

content: DOUBLE_PERCENT | TEXT | SINGLE_PERCENT expr SINGLE_PERCENT_POP;

expr:   expr (MULT|DIV) expr
    |   expr (PLUS|MINUS) expr
    |   INT
    |   ID
    |   LPAREN expr RPAREN
    ;

But only thing doesn't work and I'm not sure if it ever possible to implement without custom code (members) is modulus operation:

x = 5 % 2

There is no way I can ask Anltr to check for previous mode and safely pop mode. But I hope my understanding is wrong and there is some way to treat % sign as operator in default mode?

I have found several sources for inspiration, probably they would help you as well:

1

1 Answers

1
votes

Murphy's law for StackOverflow: you will find an answer to your own question after several minutes you post detailed question to SO.

Instead of switching to DEFAULT_MODE, I should create separate one - STRING_INTERPOLATION. This way I have to define separate tokens for this mode, which will let use % sign in normal mode (and prohibit in interpolated).

Here is Lexer and Grammar which works for me:

lexer grammar CalcLexer;

EQ: '=';
PLUS: '+';
MINUS: '-';
MULT: '*';
DIV: '/';
MOD: '%';

LPAREN : '(' ;
RPAREN : ')' ;

ID  : F_ID;
INT : F_INT;

fragment F_ID: [a-zA-Z]+ ;
fragment F_INT: [0-9]+ ;

OPEN_DOUBLE_QUOTE: '"' -> pushMode(STRING_MODE);

NEWLINE:'\r'? '\n' ;
WS  :   [ \t]+ -> skip;


mode STRING_MODE;
DOUBLE_PERCENT: '%%';
SINGLE_PERCENT: '%' -> pushMode(STRING_INTERPOLATION);
TEXT: ~('%'|'\n'|'"')+;


CLOSE_DOUBLE_QUOTE: '"' -> popMode;

mode STRING_INTERPOLATION;
SINGLE_PERCENT_POP: '%' -> popMode;

I_PLUS: PLUS -> type(PLUS);
I_MINUS: MINUS -> type(MINUS);
I_MULT: MULT -> type(MULT);
I_DIV: DIV -> type(DIV);
I_MOD: MOD -> type(MOD);

I_LPAREN: LPAREN -> type(LPAREN);
I_RPAREN: RPAREN -> type(RPAREN);

I_ID  : F_ID -> type(ID);
I_INT : F_INT -> type(INT);

WS1  :   [ \t]+ -> skip;

and

parser grammar CalcGrammar;

options { tokenVocab=CalcLexer; } // use tokens from CalcLexer.g4

prog:   stat+ ;

stat:   expr NEWLINE
    |   ID EQ (expr|text) NEWLINE
    |   NEWLINE
    ;

text: OPEN_DOUBLE_QUOTE content* CLOSE_DOUBLE_QUOTE;

content: DOUBLE_PERCENT | TEXT | SINGLE_PERCENT expr SINGLE_PERCENT_POP;

expr:   expr (MULT|DIV|MOD) expr
    |   expr (PLUS|MINUS) expr
    |   INT
    |   ID
    |   LPAREN expr RPAREN
    ;

I hope this would help someone. Probably, future me.