I have defined an AntLR4 grammar like this:
catSearch : (NOT? CATEGORY expr)+ | (OPEN_BR (catSearch | booleSearch | TERM*)+ CLOSE_BR) ;
expr : (NOT? searchValue)+ | BETWEEN;
searchValue : (TERM | PHRASE | NULL | NOT_NULL ) ;
CATEGORY : ([Aa][Dd] | [Xx][Ii])'=';
// Brackets
OPEN_BR: '(' ;
CLOSE_BR: ')' ;
// boolean operators
AND : ([Aa][Nn][Dd]) ;
OR : ([Oo][Rr]) ;
NOT : ([Nn][Oo][Tt]) ;
NULL: 'NULL' ;
NOT_NULL: 'NNULL' ;
BETWEEN: TERM'^'TERM ;
// match single search term
TERM : ~['('')''='' ''^']+ ;
// any double quoted string
PHRASE : '"' .*? '"' ;
// skip spaces, tabs, newlines
WS : [ \t\r\n]+ -> skip ;
In the rule catSearch AntLR gives an error that TERM can match an empty string. How can I define TERM so that it matches at least one character not in the list of forbidden characters, but not empty?
~[()= ^]
instead. You do not do this in the definition of AND or WS. Yet using this pattern cannot work because it will consume almost every input. Especially it will consume the tokensAND
orNULL
. So it would be better if you define what term truely could contain. – CoronATERM
can match the empty string, and notTERM*
? I don't see the point ofTERM*
since it produces the same matches asTERM
, but makes the parser's work impossible since it cannot tell how manyTERM
's are expected. (It also doesn't make sense to have an optional element in a repeated list of alternatives, for the same reason: the parser can't tell how many elements have been omitted.) – riciNO_LOWERCASE_LETTERS : (~('a'..'z'))+; // Matches all character strings excluding those with lowercase letters and the empty string
Thanks. – Uwe Allner