ANTLR4 - Language with undistinguishable tokens

Question

I'm developing a grammar for an old language.

The language is quite complex but I want to focus on a specific issue, so I made a light version of it. The light version allow to specify assign statements and simple expressions like mathematical operations or strings concatenation.

Like this:

@assign[@var1 (1+3)*2]
@assign[@var2 "foo" $ "bar"]

Note: Inside an assignment statement, variables may not start with the @ char. The statement can also be written on multiple lines, so the following assignments are equivalent:

@assign[@var2 "foo" $ "bar"]

@assign[var2 "foo" $ "bar"]

@assign
[@var2 "foo" 
$ "bar"]

@assign
[var2 "foo" 
$ "bar"]

In this language you can also print out the value of the variable. The problem is that there isn't a specific command (like @print[...]), it's sufficient to write the variable. Like this:

@var1 @var2

So, output for code

@assign[@var1 (1+3)*2]
@assign[@var2 "foo" $ "bar"]
@var1 @var2

is:

8 foobar

Here is the grammar that I've written so far starting from Mu grammar file:

grammar Grammar;

////////////////
//   PARSER   //
////////////////

file
 : block EOF
 ;

block
 : stat*
 ;

stat
 : assignment
 | print
 ;

assignment
 : ASSIGN LBRACKET variable expr RBRACKET
 ;

print
 : AT ID
 ;

expr
 : expr CONCAT expr #concatExpr
 | expr MUL expr    #mulExpr
 | expr DIV expr    #divExpr
 | expr ADD expr    #addExpr
 | expr SUB expr    #subExpr
 | atom             #atomExpr
 ;

variable
 : AT ID
 | ID
 ;

atom
 : LPARENS expr RPARENS  #parExpr
 | INT                   #intAtom
 | STRING                #stringAtom
 | variable              #variableAtom
 ;

///////////////
//   LEXER   //
///////////////

ASSIGN : AT 'assign' ;

AT : '@' ;

ID : [a-zA-Z_] [a-zA-Z_0-9]* ;

INT
 : [0-9]+
 ;

LBRACKET : '[' ;
RBRACKET : ']' ;
LPARENS : '(' ;
RPARENS : ')' ;

CONCAT : '$' ;
ADD : '+' ;
SUB : '-' ;
MUL : '*' ;
DIV : '/' ;

WS : [ \t\r\n] -> skip ;

COMMENT : '[*' .*? '*]' -> skip ;

STRING : '"' (~["\r\n] | '""')* '"' ;

To print out the variables I developed a customized visitor. Visiting visitPrint method, I know that there are two tokens: AT and ID.

Now the question.

How can I modify my grammar so that the following example code

@assign[@var1 "one"]
@assign[var2 "two"]
@assign[var3 var1 $ var2] 
Value of var3 is: @var3

generate this output?

Value of var3 is: onetwo

The goal is to make grammar able to print some free text.

I imagine that I've to rewrite the print rule. But... how?

print
 : AT ID
 | ?????? //Help!
 ;

In this case, the goal is also that "Value of var3 is: " should be a single token (not one token for each word).

This is surely the wrong way!

print
 : AT ID
 | .+?
 ;

Thanks in advance.

Ken Homer Ken Homer · Accepted Answer · 2013-07-24T17:15:04

This looks similar to the example of separating XML tags from text in Chapter 12.3 of Parr's "The Definitive ANTLR 4 Reference". He uses modes in the lexer to switch token output between inside XML tags and outside them (i.e. in plain text).

In your case, it appears that "@assign" and "]" function as your tags (mode 1), otherwise you can print the input to the output after recognizing variables.

ANTLR4 - Language with undistinguishable tokens

1 Answers