0
votes

OK, I've tried everything before coming here to ask, but this is driving me crazy.

I'm creating a simple language for querying documents in a custom NoSQL database. A sample query looks like this:

VALUE("price: " SUM($price) " Average: " AVG($price)).MATCH($price > 5 OR $price < 100 OR $cost > 30)

It is something in the middle between SQL and MONGODB's aggregation queries (the parameter in VALUE concatenates the strings and the aggregations, in match there is a boolean match).

The problem is, when I parse this, I'm getting a line 1:69 extraneous input ' ' expecting COMPARATOR followed by a line 1:69 no viable alternative at input ' '. This is the same for rows 80-82 and 95-97.

As you can see, the problem is surrounding the comparators ('<', '>', etc). I've been looking my grammar for conflicts or ambiguities without any luck (admittedly, I just got into ANTLR very recently).

Here's my grammar:

// Define a grammar called Capsa
grammar Capsa;

eval : VARIABLE | function;

function : functionValue;

functionValue : 'VALUE(' (STRING ' ')* functionNumber (' '(STRING|functionNumber))* ')' (match)?;

match : '.MATCH(' booleanexpression ')';

functionNumber: FUNCTIONNUMBERTYPE'(' value ')';
FUNCTIONNUMBERTYPE: 'SUM'|'AVG'|'MAX'|'MIN'|'FIRST'|'LAST' ;

value
  : VARIABLE          #Var
  | REALNUMBER        #Literal
  | STRING            #Literal
  | calcexpression    #Calc
  | booleanValue      #Literal;

/*
** Boolean stuff
*/
AND : '&&' | ' AND ';
OR : '||' | ' OR ';
NOT : '!' | ' NOT ';

booleanexpression : '(' booleanexpression ')'   #BooleanParentExpression
  | booleanexpression AND booleanexpression     #AndExpression
  | booleanexpression OR booleanexpression      #OrExpression
  | NOT booleanexpression                       #NotExpression
  | (value COMPARATOR value)                    #Comparison
  | booleanValue                                #ComparisonLogic;

booleanValue
  : 'true'
  | 'false';
/*
** Comparators
*/
fragment GT : '>';
fragment GTE : '>=';
fragment LT : '<';
fragment LTE : '<=';
fragment EQ : '=';
fragment EX : ':' | '==';
COMPARATOR : GT | GTE | LT | LTE | EQ | EX;
/*
** End Comparators
*/

/*
** End Boolean stuff
*/

/*
** Calc
*/
calcexpression
  : '(' calcexpression ')'                    #CalcParentExpression
  | calcexpression ('*'|'/') calcexpression   #MultOrDiv
  | calcexpression ('+'|'-') calcexpression   #AddOrSub
  | VARIABLE                                  #CalcID
  | REALNUMBER                                #CalcNumber;

/*
** End Calc
*/

fragment ID : [a-zA-Z_][a-zA-Z0-9_]+ ;
VARIABLE : '$'ID;
STRING : '"' (ESC | ~["\\])* '"' ;
fragment CONSTANT : STRING | REALNUMBER;

fragment ESC : '\\' (["\\/bfnrt] | UNICODE) ;
fragment UNICODE : 'u' HEX HEX HEX HEX ;
fragment HEX : [0-9a-fA-F] ;
fragment INT : [0-9]+ ; // no leading zeros
fragment EXP : [Ee] [+\-]? INT ; // \- since - means "range" inside [...]

REALNUMBER
: '-'? INT '.' INT EXP? // 1.35, 1.35E-9, 0.3, -4.5
| '-'? INT EXP // 1e10 -3e4
| '-'? INT // -3, 45
;

WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

The only solution I've found so far is to change the line that says:

  | (value COMPARATOR value)                    #Comparison

for:

  | (value ' '* COMPARATOR ' '* value)                    #Comparison

But looks more like a hack than a solution for me...

What am I missing? I'm pretty sure it will be something quite dumb... but I've spent the whole day on this without luck...

Bonus track:

(this one is not as important) I'm also trying to allow calc expressions in the boolean queries (like 5+3 > 6 or $variable+10 < 100), but in this case, breaks completely expecting a comparator ('>', '<', ...), when the operator ('+', '-', ...) is present.

1

1 Answers

1
votes

You are omitting whitespace, so why do you match ' ' in your grammar rule functionValue?

Remove this parts and you get a complete working grammar (on your given example) including the correct parsing of the calculation expression.

The rule is now:

functionValue : 'VALUE(' (STRING)* functionNumber ((STRING|functionNumber))* ')' (match)?;

Have fun with ANTLR4, it is a very nice tool.

PS: Think about splitting your parser grammar and lexer grammar, it will give you two files which are better to read.

Their headers will be

CapsaParser.g4

parser grammar CapsaParser;
options { tokenVocab = CapsaLexer; }

CapsaLexer.g4

lexer grammar CapsaLexer;