2
votes

I'm creating a simple boolean query parser. I would like to do something like this below.

grammar BooleanQuery;

options
{
  language = Java;
  output = AST;
}

LPAREN : ( '(' ) ;
RPAREN : ( ')' );
QUOTE  : ( '"' );
AND : ( 'AND' | '&' | 'EN' | '+' ) ;
OR : ( 'OR' | '|' | 'OF' );
WS :  ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}  ;
WORD :  (~( ' ' | '\t' | '\r' | '\n' | '(' | ')' | '"' ))*;
MINUS  : '-';
PLUS  : '+';


expr : andexpr;
andexpr : orexpr (AND^ orexpr)*;
orexpr : part (OR^ part)*;
phrase  : QUOTE ( options {greedy=false;} : . )* QUOTE;
requiredexpr : PLUS atom;
excludedexpr : MINUS atom;
part : excludedexpr | requiredexpr | atom;
atom : phrase | WORD | LPAREN! expr RPAREN!;

The problem is that the MINUS and PLUS tokens 'collide' with the MINUS and PLUS signs in the AND and OR tokens. Sorry if I don't use the correct terminology. I'm a ANTLR newbie.

Below an example query:

foo OR (pow AND -"bar with cream" AND -bar)

What mistakes did I make?

1

1 Answers

2
votes

A token must be unique. You can, however, use the same token for several purposes in you syntax (like the unary and binary minus in Java).

I do not know the exact syntax of your environment, but something like changing the following two clauses

AND : ( 'AND' | '&' | 'EN' ) ;

and

andexpr : orexpr ((AND^ | PLUS^) orexpr)*;

would probably solve this issue.