0
votes

I have defined an AntLR4 grammar like this:

catSearch : (NOT? CATEGORY expr)+ | (OPEN_BR (catSearch | booleSearch | TERM*)+ CLOSE_BR) ;

expr : (NOT? searchValue)+ | BETWEEN;

searchValue : (TERM | PHRASE | NULL | NOT_NULL ) ;

CATEGORY : ([Aa][Dd] | [Xx][Ii])'=';

// Brackets
OPEN_BR: '(' ;
CLOSE_BR: ')' ;

// boolean operators
AND : ([Aa][Nn][Dd]) ;
OR : ([Oo][Rr]) ;
NOT : ([Nn][Oo][Tt]) ;

NULL: 'NULL' ;
NOT_NULL: 'NNULL' ;

BETWEEN: TERM'^'TERM ;

// match single search term
TERM : ~['('')''='' ''^']+ ;

// any double quoted string
PHRASE : '"' .*? '"' ;  

// skip spaces, tabs, newlines
WS : [ \t\r\n]+ -> skip ;

In the rule catSearch AntLR gives an error that TERM can match an empty string. How can I define TERM so that it matches at least one character not in the list of forbidden characters, but not empty?

1
If using brackets, you should not quote the contained characters. Try ~[()= ^] instead. You do not do this in the definition of AND or WS. Yet using this pattern cannot work because it will consume almost every input. Especially it will consume the tokens AND or NULL. So it would be better if you define what term truely could contain.CoronA
Yes, it consumes nearly every input which has not been matched by the rules defined before. That is the intention; I cannot define it positively, because everything not using these five characters forms a TERM.Uwe Allner
Are you sure it is saying that TERM can match the empty string, and not TERM*? I don't see the point of TERM* since it produces the same matches as TERM, but makes the parser's work impossible since it cannot tell how many TERM's are expected. (It also doesn't make sense to have an optional element in a repeated list of alternatives, for the same reason: the parser can't tell how many elements have been omitted.)rici
@rici Well, you are right. It has to be TERM+ instead of TERM*, as it hat to occur at least once. I was irritated by a tutorial where it was stated NO_LOWERCASE_LETTERS : (~('a'..'z'))+; // Matches all character strings excluding those with lowercase letters and the empty string Thanks.Uwe Allner
@uwe: ok, made it an answer for posterity.rici

1 Answers

1
votes

I believe that Antlr is telling you that TERM* can match the empty string, not that TERM can. TERM cannot match the empty string, but of course TERM* can, and that will cause a problem in catSearch:

catSearch : ... (OPEN_BR (catSearch | booleSearch | TERM*)+ CLOSE_BR) ;

Antlr can't handle repetitions of patterns which can match the empty string, because the repetition is completely ambiguous. It could match an arbitrary number of empty strings at any poinr, so there is no way to know even how many repetitions are to be matched.

If you change the inner repetition,:

 (catSearch | booleSearch | TERM*)+ 

to

 (catSearch | booleSearch | TERM)+ 

It will match exactly the same strings, but unambiguously.