2
votes

I want to create a simple criteria expression parser with antlr3

Updated: separate AND OR expression rules to support AND/OR different hierarchy, but got another problems: if the expression is something like: a = 1 and b = 2 and c = 3 The tree should be as following according to current implement:

       =      =
 (a = 1)(b = 2)(c = 3)
But I want to generate it as follows:
          =       =
    (a = 1)(b = 2)
               (c = 3)
First "and" should be higher priority than another, because I want to parse all the expression as left exp and right exp.

I think I need to re-write the rule in the "subcond" To make a = 1 and b = 2 and c = 3 -> (a = 1 and b = 2) and c = 3

but tried many times with no luck. Has anybody got an idea how to achieve it? Thanks.


My goal is to parse some kind of SQL where clause style sentence, and build a AST to walk through.

For example:

    a = 1 and (b = 2 or c = 3)            //This one can parse correctly.
    a = 1 and ((b = 2 or c = 3) or d = 4) //This one cannot parse correctly, missing last d = 4 in the tree. 
                                          //Tree is not correct.

My current grammar file cannot parse above complex condition. For I'm newbie for antlr, not sure how to modify my grammar to achieve above approach more correctly. Can someone help on this? !Any suggestions or comments are appreciate.

and my grammar as follows (Updated according to the comments. Warning issue resolved.):

grammar CriteriaExpression;

options {
  output       = AST;
  ASTLabelType = CommonTree;
  language     = Java;
}

tokens {
  AND    = 'and';
  OR     = 'or';
  LPAREN = '(';
  RPAREN = ')';
}

@lexer::header {
package com.antlr;
}

@parser::header {
package com.antlr;
}

eval
:
expression
;

expression : andExp (OR^ andExp)* ;

andExp : subcond (AND^ subcond)* ;

subcond : LPAREN expression RPAREN |atom ;

atom
  :
  EXPR OPERATOR EXPR
  ;

OPERATOR
  :
  '='| '<>'| '!='| '<='| '!>'| '<'| '>='| '!<'| '>'| 'like'
  ;

EXPR
  :
  ('a'..'z'| 'A'..'Z'| '0'..'9')+
  ;

 WILDCARD
  :
  '%'
  ;

WS
  :
  ('\t'| ' '| '\r'| '\n'| '\u000C')*
   {$channel = HIDDEN;}
  ;

((a=1)) ((a=1))

a = 1 and ((b = 2 or c = 3) or d = 4) a = 1 and ((b = 2 or c = 3) or d = 4)

2
show incoming file for your exampleAleksei Bulgak
Hi, @Aleksei Bulgak, what is your mean incoming file? The example is just possible value I think about. The real input string could be much more complex and could be mix combination. Thanks.phyerbarte
you give this link in-complete tree. with treeAleksei Bulgak
@Aleksei Bulgak, the in-complete tree is generated by antlrworks 1.4.3 with the example a = 1 and ((b = 2 or c = 3) or d = 4), I think the tree is missing the last part d=4 and not sure how to fix it.phyerbarte
Instead of completely rewriting your already answered question, please create a new one. Or is your original question not answered?Bart Kiers

2 Answers

2
votes

One flaw in your grammar is the rule

expression
  :
  LPAREN* subcond RPAREN* (( AND | OR )^ LPAREN* subcond RPAREN*)
  ;

Since you can have any number of LPAREN or RPAREN, there is no guarantee they are matched. I suggest using somehting like

expression
  : subcond (( AND | OR ) subcond)?
  | subcond
  ;

and for subcond

subcond
  : atom (( AND | OR )^ atom)*
  | LPAREN expression RPAREN
  ;

Ideally, you should also have separate rules for AND and OR expressions to have the correct precedence in your parse tree.

Update: In your updated grammar, again you are using LPAREN* and RPAREN* which won't give you properly balanced trees. You need to model multiple parens like ((a = 1)) with recursion, like I described in my example above. This would give a tree like

((a = 1))
  ^---^--- ATOM
 ^-----^-- Subcond -> Expression
^-------^- Subcond -> Expression

So the tree would be like that:

Expression "((a = 1))"
^
Subcond "(a = 1)"
^
Expression "(a = 1)"
^
Subcond "a = 1"
^
ATOM "a = 1"
2
votes

May be I'm wrong butI think you problem connected with this thing LPAREN* something RPAREN* you can write comething like this ((something ) and antlr think that this write because LParent and Rparent have not connected to each other so may be use something like this

COMPLEX:
    LPARENT (COMPLEX|subcond) RPARENT;

But I will say it again, maybe I'm wrong

UPDATE

change this:

subcond
  : 
  //atom (( AND | OR )^ atom)*
  LPAREN* atom RPAREN*
  ;

to this:

subcond
  : 
  LPAREN (subcond|atom) RPAREN
  ;

using this you can now write something like this ((a=1))