3
votes

I'm writing a grammar that supports arbitrary boolean expressions. The grammar is used to represent a program, which is later passed through the static analysis tool. The static analysis tool has certain limitations so I want to apply the following rewrite rules:

Strict inequalities are approximated with epsilon:

expression_a > expression_b -> expression_a >= expression_b + EPSILON

Inequality is approximated using "or" statement:

expression_a != expression_b -> expression_a > expression_b || expression_a < expression_b

Is there any easy way to do it using ANTLR? Currently my grammar looks like so:

comparison          : expression ('=='^|'<='^|'>='^|'!='^|'>'^|'<'^) expression;

I'm not sure how to apply a different rewrite rule depending on what the operator is. I want to tree stay as it is if the operator is ("==", "<=" or ">=") and to recursively transform it otherwise, according to the rules defined above.

1

1 Answers

3
votes

[...] and to recursively transform it otherwise, [...]

You can do it partly.

You can't tell ANTLR to rewrite a > b to ^('>=' a ^('+' b epsilon)) and then define a != b to become ^('||' ^('>' a b) ^('<' a b)) and then have ANTLR automatically rewrite both ^('>' a b) and ^('<' a b) to ^('>=' a ^('+' b epsilon)) and ^('<=' a ^('-' b epsilon)) respectively.

A bit of manual work is needed here. The trick is that you can't just use a token like >= if this token isn't actually parsed. A solution to this is to use imaginary tokens.

A quick demo:

grammar T;

options {
  output=AST;
}

tokens {
  AND;
  OR;
  GTEQ;
  LTEQ;
  SUB;
  ADD;
  EPSILON;
}

parse
 : expr
 ;

expr
 : logical_expr
 ;

logical_expr
 : comp_expr ((And | Or)^ comp_expr)*
 ;

comp_expr
 : (e1=mult_expr -> $e1) ( Eq   e2=mult_expr -> ^(AND ^(GTEQ $e1 $e2) ^(LTEQ $e1 $e2))
                         | LtEq e2=mult_expr -> ^(LTEQ $e1 $e2)
                         | GtEq e2=mult_expr -> ^(GTEQ $e1 $e2)
                         | NEq  e2=mult_expr -> ^(OR ^(GTEQ $e1 ^(ADD $e2 EPSILON)) ^(LTEQ $e1 ^(SUB $e2 EPSILON)))
                         | Gt   e2=mult_expr -> ^(GTEQ $e1 ^(ADD $e2 EPSILON))
                         | Lt   e2=mult_expr -> ^(LTEQ $e1 ^(SUB $e2 EPSILON))
                         )?
 ;

add_expr
 : mult_expr ((Add | Sub)^ mult_expr)*
 ;

mult_expr
 : atom ((Mult | Div)^ atom)*
 ;

atom
 : Num
 | Id
 | '(' expr ')'
 ;

Eq    : '==';
LtEq  : '<=';
GtEq  : '>=';
NEq   : '!=';
Gt    : '>';
Lt    : '<';
Or    : '||';
And   : '&&';
Mult  : '*';
Div   : '/';
Add   : '+';
Sub   : '-';
Num   : '0'..'9'+ ('.' '0'..'9'+)?;
Id    : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};

The parser generated from the grammar above will produce the following:


a == b

enter image description here


a != b

enter image description here


a > b

enter image description here


a < b

enter image description here